**F**or quite a while, I entertained the idea that Beta and Dirichlet proposals were more adequate than (log-)normal random walks proposals for parameters on (0,1) and simplicia (simplices, simplexes), respectively, when running an MCMC. For instance, for p in (0,1) the value of the Markov chain at time t-1, the proposal at time t could be a Be(εp,ε{1-p}) generator, since its mean is equal to p and its variance is proportional to 1/(1+ε). (Although I cannot find track of this notion in my books.) The parameter ε can be calibrated towards a given acceptance rate, like the golden number 0.234 of Gelman, Gilks and Roberts (1996). However, when using this proposal on a mixture model, Kaniav Kamari and myself realised today that there is a catch, namely that pushing ε down to achieve an acceptance rate near 0.234 may end up in disaster, since the parameters of the Beta or of the Dirichlet may become lower than 1, which implies an infinite explosion on some boundaries of the parameter space. An explosion that gets more and more serious as ε decreases to zero. Hence is more and more likely to decrease the acceptance rate, thus to reduce ε, which in turns concentrates even more the support on the boundary and leads to a vicious circle and no convergence to the target acceptance rate… Continue reading

## Archive for beta distribution

## debunking a (minor and personal) myth

Posted in Books, Kids, R, Statistics, University life with tags adaptive MCMC methods, beta distribution, Dirichlet prior, Gaussian random walk, Markov chains on September 9, 2015 by xi'an## ABC for bivariate betas

Posted in Statistics, University life with tags ABC, beta distribution, empirical likelihood, latent variable, MCMC algorithms, Monte Carlo Statistical Methods, multivariate Beta distribution, simulation on February 19, 2014 by xi'an**C**rakel and Flegal just arXived a short paper running ABC for doing inference on the parameters of two families of bivariate betas. And I could not but read it thru. And wonder why ABC was that necessary to handle the model. The said bivariate betas are defined from

when

and

This makes each term in the pair Beta and the two components dependent. This construct was proposed by Arnold and Ng (2011). (The five-parameter version cancels the gammas for i=3,4,5.)

**S**ince the pdf of the joint distribution is not available in closed form, Crakel and Flegal zoom on ABC-MCMC as the method of choice and discuss simulation experiments. (The choice of the tolerance ε as an absolute rather than relative value, ε=0.2,0.0.6,0.8, puzzles me, esp. since the distance between the summary statistics is not scaled.) I however wonder why other approaches are impossible. (Or why it is necessary to use this distribution to model correlated betas. Unless I am confused copulas were invented to this effect.) First, this is a latent variable model, so latent variables could be introduced inside an MCMC scheme. A wee bit costly but feasible. Second, several moments of those distributions are known so a empirical likelihood approach could be considered.

## beta HPD

Posted in Books, R, Statistics, Uncategorized, University life with tags beta distribution, book chapter, fREN, French paper, HPD region, pbeta(), R, uniroot() on October 17, 2013 by xi'an**W**hile writing an introductory chapter on Bayesian analysis (in French), I came by the issue of computing an HPD region when the posterior distribution is a Beta B(α,β) distribution… There is no analytic solution and hence I resorted to numerical resolution (provided here for α=117.5, β=115.5):

f=function(p){ # find the symmetric g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))} return(uniroot(g,c(.504,.99))$root)} ff=function(alpha){ # find the coverage g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))} return(uniroot(g,c(.011,.49))$root)}

and got the following return:

> ff(.95) [1] 0.4504879 > f(ff(.95)) [1] 0.5580267

which was enough for my simple book illustration… Since (.450,558) is then the HPD region at credible level 0.95.

## A slice of infinity

Posted in R, Statistics, University life with tags beta distribution, convergence, ergodicity, Gibbs sampling, irreducibility, JAGS, R, slice sampling on July 28, 2011 by xi'an**P**eng Yu sent me an email about the conditions for convergence of a Gibbs sampler:

The following statement mentions convergence. But I’m not familiar what the regularity condition is.

“But it is necessary to have a finite probability of moving away from the current state at all times in order to satisfy the regularity conditions on which the whole MCMC theory depends.”

Slice sampler is discussed in your book

. I think that the “regularity condition” may have been discussed in your book. If so, would you please let me know where it is? Thanks and look forward to hearing from you!Monte Carlo Statistical Methods

**T**he quote is from Martyn Plummer and deals with a stopping rule in ** JAGS** implementation of the slice sampler. (The correct wording should be “strictly positive probability” rather than “finite probability”, I think.) However, this has nothing to do with a “regularity condition” on the irreducibility of a Markov chain: if a slice sampler is implemented for an unbounded density target, say a

*Beta(1/2,1/2)*, there is no irreducibility condition connected with the infiniteness of the density. In theory, (a) the chain never visits the “state” where the density is infinite (if only because we are dealing with a continuous state space) and (b) after visiting a value

*x*with a large density

*f(x)*, the slice sampler allows for a move away from it since the slice involves a uniform simulation over

*(0,f(x))*. Deeper properties of the slice sampler (like geometric ergodicity) are explored in, e.g., this JRSS B paper by Gareth Roberts and Jeff Rosenthal and this one in the Annals of Statistics by Radford Neal. In practice, the problem is caused by values of

*f(x)*that cannot be computed and hence produce an error message like

Singularity in likelihood found by Slicer.

If those singularities can be localised, a neighbourhood excluding them should be introduced. (More easily said than done, obviously!)

Here is an example of a slice sampler with the *Beta(1/2,1/2)* distribution:

#graphics dote=function(x,y) points(x,y,col="gold",pch=19,cex=.4) mote=function(x,y,z,w) lines(c(x,z),c(y,w),col="gold",lwd=.5) cst=dbeta(.5,.5,.5)*.5 #normalising constant #inverting f(x)=d, 2nd degree equation hitden=function(d) .5+.5*sqrt(1-4*( cst/ max(d,dbeta(.5,.5,.5)))^2)*c(-1,1) #output curve(dbeta(x,.5,.5),0,1,ylab="density",lwd=2,col="steelblue",n=1001) x=runif(1);u=runif(1)*dbeta(x,.5,.5);dote(x,u) for (t in 1:100){ #100 slice steps bo=hitden(u) nx=sample(c(runif(1,0,bo[1]),runif(1,bo[2],1)),1) nu=runif(1)*dbeta(nx,.5,.5) mote(x,u,nx,nu) x=nx;u=nu;dote(x,u) }

which clearly explores the whole area under the *Beta(1/2,1/2)* density. Even when started at a large density value like *f(.999999)*, it eventually leaves the vicinity of this highly improbable value.

## Asher’s enigma

Posted in R, Statistics with tags beta distribution, puzzle, R-bloggers, unit square on July 26, 2010 by xi'an**O**n his Probability and statistics blog, Matt Asher put a funny question (with my rephrasing):

Take a unit square. Now pick two spots at random along the perimeter, uniformly. For each of these two locations, pick another random point from one of the three other sides of the square and draw the segment. What is the probability the two segments intersect? And what is the distribution for the intersection points?

**T**he (my) intuition for the first question was 1/2, but a quick computation led to another answer. The key to the computation is to distinguish whether or not both segments share one side of the square. They do with probability

in which case they intersect with probability 1/2. They occupy the four sides with probability 1/6, in which case they intersect with probability 1/3. So the final answer is 17/36 (as posted by several readers and empirically found by Matt). The second question is much more tricky: the histogram of the distribution of the coordinates is peaked towards the boundaries, thus reminding me of an arc-sine distribution, but there is a bump in the middle as well. Computing the coordinates of the intersection depending on the respective positions of the endpoints of both segments and simulating those distributions led me to histograms that looked either like beta B(a,a) distributions, or like beta B(1,a) distributions, or like beta B(a,1) distributions… Not exactly, though. So not even a mixture of beta distributions is enough to explain the distribution of the intersection points… For instance, the intersection points corresponding to segments were both segments start from the same side and end up in the opposite side are distributed as

where all *u*‘s are uniform on *(0,1)* and under the constraint . The following graph shows how well a beta distribution fits in that case. (Not perfectly, though!)

The R code is

u=matrix(runif(4*10^5),ncol=4)

u[,c(1,3)]=t(apply(u[,c(1,3)],1,sort))

u[,c(2,4)]=-t(apply(-u[,c(2,4)],1,sort))

y=(u[,1]*(u[,4]-u[,3])-u[,3]*(u[,2]-u[,1]))/(u[,1]+u[,4]-u[,2]-u[,3])

**S**imilarly, if the two segments start from the same side but end up on different sides, the distribution of one coordinate is given by

under the constraint . The outcome is once again almost distributed as a beta:

The corresponding R code is

u=matrix(runif(4*10^5),ncol=4)

u[,c(1,3)]=-t(apply(-u[,c(1,3)],1,sort))

y=(u[,1]*(1-u[,3])-u[,3]*u[,4]*(u[,2]-u[,1]))/(1-u[,3]-u[,4]*(u[,2]-u[,1]))