## Busy Mondays

Posted in Statistics, University life with tags , , , , on September 30, 2010 by xi'an

There is now a combination of no less than three generic statistics seminars alternating on Monday afternoons in Paris:

Fortunately, Pierre Alquier has made a google calendar that summarises where and when to go!

## Le Monde puzzle [38]

Posted in R, University life with tags , , , on September 30, 2010 by xi'an

Since I have resumed my R class, I will restart my resolution of Le Monde mathematical puzzles…as they make good exercises for the class. The puzzle this week is not that exciting:

Find the four non-zero different digits a,b,c,d such that abcd is equal to the sum of all two digit numbers made by picking without replacement two digits from {a,b,c,d}.

The (my) dumb solution is to proceed by enumeration

for (a in 1:9){
for (b in (1:9)[-a]){
for (c in (1:9)[-c(a,b)]){
for (d in (1:9)[-c(a,b,c)]){
if (231*sum(c(a,b,c,d))==sum(10^(0:3)*c(a,b,c,d)))
print(c(a,b,c,d))
}}}}

taking advantage of the fact that the sum of all two-digit numbers is (30+4-1) times the sum a+b+c+d, but there is certainly a cleverer way to solve the puzzle (even though past experience has shown that this was not always the case!)
Share:ShareClick to email this to a friend (Opens in new window)Share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to print (Opens in new window)Click to share on StumbleUpon (Opens in new window)Click to share on Reddit (Opens in new window)

Posted in Statistics, University life with tags , , , , , , , , , on September 29, 2010 by xi'an

“Logical overlap is the norm for the complex models analyzed with ABC, so many ABC posterior model probabilities published to date are wrong.” Alan R. Templeton, PNAS, doi:10.1073/pnas.1009012107

Our letter in PNAS about Templeton’s surprising diatribe on Bayesian inference is now appeared in the early edition, along with Templeton’s reply. This reply is unfortunately missing any novelty element compared with the original paper. First, he maintains that the critcism is about ABC (which is, in case you do not know, a computational technique and not a specific statistical methodology!). Second, he insists on the inappropriate Venn diagram analogy by reproducing the basic identity

$P(A\cup B\cup C) = P(A)+P(B)+P(C)-P(A\cap B)-P(B\cap C)-P(C\cap A)+P(A\cap B\cap C)$

(presumably in case we had lost sight of it!) to argue that using instead

$P(A)+P(B)+P(C)$

is incoherent (hence rejecting Bayes factors, Bayesian model averaging and so on). I am not particularly surprised by this immutable stance, but it means that there is little point in debate when starting from such positions… Our main goal in publishing this letter was actually to stress that the earlier tribune had no statistical ground and I think we achieved this goal.

## Publication from the frontier

Posted in Books, Statistics, Travel with tags , , , on September 29, 2010 by xi'an

In conjunction with the conference in San Antonio last March, I have received the book Frontiers of Statistical Decision Making and Bayesian Analysis: In Honor of James O. Berger edited by Ming-Hui Chen (University of Connecticut), Dipak K. Dey (University of Connecticut), Peter Müller (University of Texas M. D. Anderson Cancer Center), Dongchu Sun (University of Missouri- Columbia) and Keying Ye (University of Texas at San Antonio), who, incidentally, were are PhD students of Jim Berger at the time I visited Purdue University. The book has been edited in depth and so it reads very well, with contributions regrouped by chapters. Here is the table of contents:

1. Introduction.
2. Objective Bayesian inference with applications.
3. Bayesian decision based estimation and predictive inference.
4. Bayesian model selection and hypothesis tests.
5. Bayesian computer models.
6. Bayesian nonparametrics and semi-parametrics.
7. Bayesian case influence and frequentist interface.
8. Bayesian clinical trials.
9. Bayesian methods for genomics, molecular, and systems biology.
10. Bayesian data mining and machine learning.
11. Bayesian inference in political and social sciences, finance, and marketing.
12. Bayesian categorical data analysis.
13. Bayesian geophysical, spatial, and temporal statistics.
14. Posterior simulation and Monte Carlo methods.

whose final chapter (the only one missing Bayesian from the title!) contains our contribution with Jean-Michel Marin.

## Galton & simulation

Posted in Books, R, Statistics with tags , , , , , , , , on September 28, 2010 by xi'an

Stephen Stigler has written a paper in the Journal of the Royal Statistical Society Series A on Francis Galton’s analysis of (his cousin) Charles Darwin’ Origin of Species, leading to nothing less than Bayesian analysis and accept-reject algorithms!

“On September 10th, 1885, Francis Galton ushered in a new era of Statistical Enlightenment with an address to the British Association for the Advancement of Science in Aberdeen. In the process of solving a puzzle that had lain dormant in Darwin’s Origin of Species, Galton introduced multivariate analysis and paved the way towards modern Bayesian statistics. The background to this work is recounted, including the recognition of a failed attempt by Galton in 1877 as providing the first use of a rejection sampling algorithm for the simulation of a posterior distribution, and the first appearance of a proper Bayesian analysis for the normal distribution.”

The point of interest is that Galton proposes through his (multi-stage) quincunx apparatus a way to simulate from the posterior of a normal mean (here is an R link to the original quincunx). This quincunx has a vertical screen at the second level that acts as a way to physically incorporate the likelihood (it also translates the fact that the likelihood is in another “orthogonal” space, compared  with the prior!):

“Take another look at Galton’s discarded 1877 model for natural selection (Fig. 6). It is nothing less that a workable simulation algorithm for taking a normal prior (the top level) and a normal likelihood (the natural selection vertical screen) and finding a normal posterior (the lower level, including the rescaling as a probability density with the thin front compartment of uniform thickness).”

Besides a simulation machinery (steampunk Monte Carlo?!), it also offers the enormous appeal of proposing the derivation of the normal-normal posterior for the very first time:

“Galton was not thinking in explicit Bayesian terms, of course, but mathematically he has posterior $\mathcal{N}(0,C_2)\propto\mathcal{N}(0,A_2)\times f(x=0|y)$. This may be the earliest appearance of this calculation; the now standard derivation of a posterior distribution in a normal setting with a proper normal prior. Galton gave the general version of this result as part of his 1885 development, but the 1877 version can be seen as an algorithm employing rejection sampling that could be used for the generation of values from a posterior distribution. If we replace $f(x)$ above by the density $\mathcal{N}(a,B_2)$, his algorithm would generate the posterior distribution of Y given X=a, namely $\mathcal{N}(aC_2/B_2, C_2)$. The assumption of normality is of course needed for the particular formulae here, but as an algorithm the normality is not essential; posterior values for any prior and any location parameter likelihood could in principle be generated by extending this algorithm.” Continue reading

## Riemann, Langevin & Hamilton [reply]

Posted in R, Statistics, University life with tags , , , , , on September 27, 2010 by xi'an

Here is a (prompt!) reply from Mark Girolami corresponding to the earlier post:

In preparation for the Read Paper session next month at the RSS, our research group at CREST has collectively read the Girolami and Calderhead paper on Riemann manifold Langevin and Hamiltonian Monte Carlo methods and I hope we will again produce a joint arXiv preprint out of our comments. (The above picture is reproduced from Radford Neal’s talk at JSM 1999 in Baltimore, talk that I remember attending!) Although this only represents my preliminary/early impression on the paper, I have trouble with the Physics connection. Because it involves continuous time events that are not transcribed directly into the simulation process.

It might be easier to look at the two attached figures rather than the isocontours in the complete phase space (q, p) of the 1-d Gaussian that is in the pic you include in your post. Both figures relate to sampling from a zero mean bivariate Gaussian with covariance matrix $\Sigma$ — marginal variance 1 and cross-covariance $\rho = 0.6$ (example taken from Neal 2010). In the first figure there are three panes showing the 20 integration steps obtained by standard HMC where the metric tensor for the space of bivariate Gaussians is NOT employed and so the leftmost plot shows the proposal effectively moving from -1.5, -1 to 1.5, 1.5 basically a large traversal over the 2d sampling space. The middle plot shows the corresponding momentum variable steps and the right plot shows the total Hamiltonian value (energy or negative joint likelihood) where the difference at the start and end of the integration is very small indicating a high probability of acceptance. That is all fine – large proposal step accepted with high probability.

Now lets look at the second figure. Lay aside the Physics interpretation and let’s try to adopt a geometric one in this case to see if this helps the understanding. So our task is to simulate from this bivariate Gaussian we know that the metric tensor $\Sigma^{-1}$ defines a flat manifold of bivariate densities. Now if we wanted to move from one point to another we would wish to take the most direct route – the shortest path in the space — the geodesic in other words. The geodesics on a manifold are defined by the second order differential equation in terms of the coordinates (our sample space) and the Christofell symbols of the manifold — so to make a proposed move along the geodesic we need to solve the differential equations describing the geodesic. Now by rewriting the geodesic equation in a different form we end up with Hamiltons equations -— in other words the solution flow of the geodesic equations coincides with the Hamiltonian flows — so solving Hamiltons equations is just another way to solve the geodesic equation. So the variable p just emerges from the rewriting of the geodesic equation and nothing more. In this case as the manifold corresponding to bivariate Gaussians is flat and the metric tensor is the inverse of the covariance matrix — the geodesics will be elliptical paths defined by $\Sigma$ — in other words the isocontours of the bivariate Gaussian. So the first panel in this figure shows 15 integration steps and as can be observed the path follows the elliptical isocontour of the target density — as the p variable is the dual in the geodesic equations then it also maps out an isocontour defined by the metric tensor. Continue reading

## Riemann, Langevin & Hamilton

Posted in Statistics, University life with tags , , , , , on September 27, 2010 by xi'an

In preparation for the Read Paper session next month at the RSS, our research group at CREST has collectively read the Girolami and Calderhead paper on Riemann manifold Langevin and Hamiltonian Monte Carlo methods and I hope we will again produce a joint arXiv preprint out of our comments. (The above picture is reproduced from Radford Neal’s talk at JSM 1999 in Baltimore, talk that I remember attending…) Although this only represents my preliminary/early impression on the paper, I have trouble with the Physics connection. Because it involves continuous time events that are not transcribed directly into the simulation process.

Overall, trying to take advantage of second order properties of the target—just like the Langevin improvement takes advantage of the first order—is a natural idea which, when implementable, can obviously speed up convergence. This is the Langevin part, which may use a fixed metric M or a local metric defining a Riemann manifold, G(θ). So far, so good, assuming the derivation of an observed or expected information G(θ) is feasible up to some approximation level. The Hamiltonian part that confuses me introduces a dynamic on level sets of

$\mathscr{H}(\theta,\mathbf{p})=-\mathcal{L}(\theta)+\frac{1}{2}\log\{(2\pi)^D|\mathbf{G}(\theta)|\}+\frac{1}{2}\mathbf{p}^\text{T}\mathbf{G}(\theta)^{-1}\mathbf{p}$

where p is an auxiliary vector of dimension D. Namely,

$\dot{\mathbf{p}} = \dfrac{\partial \mathscr{H}}{\partial \mathbf{p}}(\theta,\mathbf{p})\,,\qquad\dot{\theta}=\dfrac{\partial \mathscr{H}}{\partial \theta}(\theta,\mathbf{p})\,.$

While I understand the purpose of the auxiliary vector, namely to speed up the exploration of the posterior surface by taking advantage of the additional energy provided by p, I fail to understand why the fact that the discretised (Euler) approximation to Hamilton’s equations is not available in closed form is such an issue…. The fact that the (deterministic?) leapfrog integrator is not exact should not matter since this can be corrected by a Metropolis-Hastings step.

While the logistic example is mostly a toy problem (where importance sampling works extremely well, as shown in our survey with Jean-Michel Marin), the stochastic volatility is more challenging and the fact that the Hamiltonian scheme applies to the missing data (volatility) as well as to the three parameters of the model is quite interesting. I however wonder at the appeal of this involved scheme when considering that the full conditional of the volatility can be simulated exactly