Filed under: Books, Mountains, pictures, Statistics, Travel, University life Tagged: Bell Labs, Biometrika, Canada, Julian Besag, Metropolis-Hastings algorithm, obituary, Peskun ordering, University of Canterbury, University of Victoria, Victoria, Wilfred Keith Hastings ]]>

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){ mean=0 for (t in 1:n){ board=rep(1,900) for (v in 1:T){ beard=rep(0,900) if (board[1]>0){ poz=c(0,1,0,30) ne=rmultinom(1,board[1],prob=(poz!=0)) beard[1+poz]=beard[1+poz]+ne} # for (i in (2:899)[board[-1][-899]>0]){ u=(i-1)%%30+1;v=(i-1)%/%30+1 poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30)) ne=rmultinom(1,board[i],prob=(poz!=0)) beard[i+poz]=beard[i+poz]+ne} # if (board[900]>0){ poz=c(-1,0,-30,0) ne=rmultinom(1,board[900],prob=(poz!=0)) beard[900+poz]=beard[900+poz]+ne} board=beard} mean=mean+sum(board==0)} return(mean/n)}

The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3) [1] 331.082 > 900/exp(1) [1] 331.0915

Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has *no stationary distribution*, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){ return(c(2,rep(3,B-2),2,rep(c(3, rep(4,B-2),3),B-2),2,rep(3,B-2),2))}

namely

> mn=0;n=1e8 #14 clock hours! > proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30) > for (t in 1:n) + mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4] > mn=mn/n > mn[1]=mn[1]-450 > mn 0 1 2 3 166.11 164.92 82.56 27.71 > exprmt(n=1e6) #55 clock hours!! [1] 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)

Filed under: Books, Kids, pictures, R, Statistics Tagged: coupling, cross validated, fleas, Leonhard Euler, Markov chains, Markov random field, Monte Carlo integration, occupancy, Poisson approximation, R, random walk, stationarity ]]>

If one uses the standard Stirling approximation to the factorial function,

log(N!)≈Nlog(N) – N + ½log(2πN)

the approximation to ℘ is 1/√πN, which is not perfect for the small values of N. Introducing the second order Stirling approximation,

log(N!)≈Nlog(N) – N + ½log(2πN) + 1/12N

the approximation become

℘≈exp(-1/8N)/√πN

which fits almost exactly from the start. This accuracy was already pointed out by William Feller, Section II.9.

Filed under: Kids, R, Statistics Tagged: American elections 2016, Stirling approximation, The Riddler ]]>

- if the targeted expectation only depends on one component of the Markov chain, multiplying the number of simulations for the other components has no clear impact, except in increasing time;
- if the corresponding full conditional is very concentrated, repeating simulations should produce quasi-repetitions, and no gain.

The only advantage in computing time that I can see at this stage is when constructing the MCMC sampler for the full proposal is much more costly than repeating MCMC iterations, which are then almost free and contribute to the reduction of the variance of the estimator.

This analysis of MCMC-withing-Gibbs strategies reminds me of a recent X validated question, which was about the proper degree of splitting simulations from a marginal and from a corresponding conditional in the chain rule, the optimal balance being in my opinion dependent on the relative variances of the conditional expectations.

A last point is that recycling in the context of simulation and Monte Carlo methodology makes me immediately think of Rao-Blackwellisation, which is surprisingly absent from the current paper. Rao-Blackwellisation was introduced in the MCMC literature and to the MCMC community in the first papers of Alan Gelfand and Adrian Smith, in 1990. While this is not always producing a major gain in Monte Carlo variability, it remains a generic way of recycling auxiliary variables as shown, e.g., in the recycling paper we wrote with George Casella in 1996, one of my favourite papers.

Filed under: Books, pictures, Statistics, University life Tagged: Adrian Smith, Alan Gelfand, conditional density, cross validated, George Casella, Gibbs sampling, MCMC algorithms, Metropolis-within-Gibbs algorithm, Monte Carlo Statistical Methods, Rao-Blackwellisation, simulation ]]>

“Regardless of the decision rule, however, the primary attraction of using parameter estimation to assess null values is that the an explicit posterior distribution reveals the relative credibility of all the parameter values.” (p.302)

After reading this paper, I realised that Kruschke meant something completely different, namely that a Bayesian approach to null hypothesis testing could operate from the posterior on the corresponding parameter, rather than to engage into formal Bayesian model comparison (null versus the rest of the World). The notion is to check whether or not the null value stands within the 95% [why 95?] HPD region [modulo a buffer zone], which offers the pluses of avoiding a Dirac mass at the null value and a long-term impact of the prior tails on the decision, with the minus of replacing the null with a tolerance region around the null and calibrating the rejection level. This opposition is thus a Bayesian counterpart of running tests on point null hypotheses either by Neyman-Pearson procedures or by confidence intervals. Note that in problems with nuisance parameters this solution requires a determination of the 95% HPD region associated with the marginal on the parameter of interest, which may prove a challenge.

“…the measure provides a natural penalty for vague priors that allow a broad range of parameter values, because a vague prior dilutes credibility across a broad range of parameter values, and therefore the weighted average is also attenuated.” (p. 306)

While I agree with most of the critical assessment of Bayesian model comparison, including Kruschke’s version of Occam’s razor [and Lindley’s paradox] above, I do not understand how Bayesian model comparison fails to return a full posterior on both the model indices [for model comparison] and the model parameters [for estimation]. To state that it does not because the Bayes factor only depends on marginal likelihoods (p.307) sounds unfair if only because most numerical techniques to approximate the Bayes factors rely on preliminary simulations of the posterior. The point that the Bayes factor strongly depends on the modelling of the alternative model is well-taken, albeit the selection of the null in the “estimation” approach does depend as well on this alternative modelling. Which is an issue if one ends up accepting the null value and running a Bayesian analysis based on this null value.

“The two Bayesian approaches to assessing null values can be unified in a single hierarchical model.” (p.308)

Incidentally, the paper briefly considers a unified modelling that can be interpreted as a mixture across both models, but this mixture representation completely differs from ours [where we also advocate estimation to replace testing] since the mixture is at the *likelihood x prior* level, as in O’Neill and Kypriaos.

Filed under: Books, pictures, Statistics Tagged: Bayes factors, Bayesian model comparison, Bayesian tests of hypotheses, cross validated, HPD region, John Kruschke, marginal likelihood ]]>

Filed under: Mountains, pictures, Running Tagged: Austria, Öxit, elections, EU, European Community, far-right ]]>

**T**oday the French National Assembly voted the rehabilitation of all victims of the repression of the Commune de Paris (March 18—May 28, 1871), which saw between 10,000 and 20,000 Parisian men and women summarily executed by troops of the provisional government of Adolphe Thiers during the Bloody Week.

Filed under: Books, Kids, pictures Tagged: 1871, Adolphe Thiers, Bloody Week, Commune de Paris, Paris ]]>

Filed under: Kids, pictures Tagged: abortion, church, French elections, French politics, Liberation, primaries, women rights ]]>

0≤u≤Φο**ƒ**[φοΦ⁻¹(u)v]

induces the distribution with density proportional to **ƒ** on φοΦ⁻¹(U)V. This set Λ has a boundary that is parameterised as

u=Φο**ƒ**(x), v=1/φο**ƒ**(x), x∈Χ

which remains bounded in u since Φ is a cdf and in v if φ has fat enough tails. At both 0 and ∞. When **ƒ** is the Gamma(a,1) density this can be achieved if φ behaves like log(x)² near zero and like a inverse power at infinity. Without getting into all the gory details, closed form density φ and cdf Φ can be constructed for all a’s, as shown for a=½ by the boundaries in u and v (yellow) below

which leads to a bounded associated set Λ

At this stage, I remain uncertain of the relevance of such derivations, if only because the set A thus derived is ill-suited for uniform draws proposed on the enclosing square box. And also because a Gamma(a,1) simulation can rather simply be derived from a Gamma(a+1,1) simulation. But, who knows?!, there may be alternative usages of this representation, such as innovative slice samplers. Which means the ratio-of-uniform method may reappear on the ‘Og one of those days…

Filed under: Books, pictures, R, Statistics, University life Tagged: Luc Devroye, Non-Uniform Random Variate Generation, random number generation, ratio of uniform algorithm, University of Warwick ]]>

* “L’exercice du pouvoir ne m’a jamais fait perdre ma lucidité. Aujourd’hui, je suis conscient des risques que ferait courir une démarche, la mienne, qui ne rassemblerait pas largement autour d’elle. Aussi, j’ai décidé de ne pas être candidat à l’élection présidentielle.”*

Filed under: pictures Tagged: 2017, Francois Hollande, French elections, primaries, succession ]]>