intuition beyond a Beta property

Posted in Books, Kids, R, Statistics, University life with tags , , , on March 30, 2015 by xi'an

A self-study question on X validated exposed an interesting property of the Beta distribution:

If x is B(n,m) and y is B(n+½,m) then √xy is B(2n,2m)

While this can presumably be established by a mere change of variables, I could not carry the derivation till the end and used instead the moment generating function E[(XY)s/2] since it naturally leads to ratios of B(a,b) functions and to nice cancellations thanks to the ½ in some Gamma functions [and this was the solution proposed on X validated]. However, I wonder at a more fundamental derivation of the property that would stem from a statistical reasoning… Trying with the ratio of Gamma random variables did not work. And the connection with order statistics does not apply because of the ½. Any idea?

where did the normalising constants go?! [part 2]

Posted in R, Statistics, Travel with tags , , , , , , , on March 12, 2014 by xi'an

Coming (swiftly and smoothly) back home after this wonderful and intense week in Banff, I hugged my loved ones,  quickly unpacked, ran a washing machine, and  then sat down to check where and how my reasoning was wrong. To start with, I experimented with a toy example in R:

# true target is (x^.7(1-x)^.3) (x^1.3 (1-x)^1.7)
# ie a Beta(3,3) distribution

# samples from partial posteriors
N=10^5
sam1=rbeta(N,1.7,1.3)
sam2=rbeta(N,2.3,2.7)

# first version: product of density estimates
dens1=density(sam1,from=0,to=1)
dens2=density(sam2,from=0,to=1)
prod=dens1$y*dens2$y
# normalising by hand
prod=prod*length(dens1$x)/sum(prod) plot(dens1$x,prod,type="l",col="steelblue",lwd=2)

# second version: F-S & P's yin+yang sampling
# with weights proportional to the other posterior

subsam1=sam1[sample(1:N,N,prob=dbeta(sam1,2.3,2.7),rep=T)]
plot(density(subsam1,from=0,to=1),col="steelblue",lwd=2)

subsam2=sam2[sample(1:N,N,prob=dbeta(sam2,1.7,1.3),rep=T)]
plot(density(subsam2,from=0,to=1),col="steelblue",lwd=2)


and (of course!) it produced the perfect fits reproduced below. Writing the R code acted as a developing bath as it showed why we could do without the constants!

Of course”, because the various derivations in the above R code all are clearly independent from the normalising constant: (i) when considering a product of kernel density estimators, as in the first version, this is an approximation of

$\prod_{i=1}^k p_i(\theta)$

as well as of

$\prod_{ i}^k m_i(\theta)$

since the constant does not matter. (ii) When considering a sample from mi and weighting it by the product of the remaining true or estimated mj‘s, this is a sampling weighting resampling simulation from the density proportional to the product and hence, once again, the constants do not matter. At last, (iii) when mixing the two subsamples, since they both are distributed from the product density, the constants do not matter. As I slowly realised when running this morning (trail-running, not code-runninh!, for the very first time in ten days!), the straight-from-the-box importance sampling version on the mixed samples I considered yesterday (to the point of wondering out loud where did the constants go) is never implemented in the cited papers. Hence, the fact that

$\prod_i^k p_i(\theta)\propto \prod_{i}^k m_i(\theta)$

is enough to justify handling the target directly as the product of the partial marginals. End of the mystery. Anticlimactic end, sorry…

ABC for bivariate betas

Posted in Statistics, University life with tags , , , , , , , on February 19, 2014 by xi'an

Crakel and Flegal just arXived a short paper running ABC for doing inference on the parameters of two families of bivariate betas. And I could not but read it thru. And wonder why ABC was that necessary to handle the model. The said bivariate betas are defined from

$V_1=(U_1+U_5+U_7)/(U_3+U_6+U_8)\,,$

$V_2=(U_2+U_5+U_8)/(U_4+U_6+U_7)$

when

$U_i\sim \text{Ga}(\delta_i,1)$

and

$X_1=V_1/(1+V_1)\,,\ X_2=V_2/(1+V_2)$

This makes each term in the pair Beta and the two components dependent. This construct was proposed by Arnold and Ng (2011). (The five-parameter version cancels the gammas for i=3,4,5.)

Since the pdf of the joint distribution is not available in closed form, Crakel and Flegal zoom on ABC-MCMC as the method of choice and discuss simulation experiments. (The choice of the tolerance ε as an absolute rather than relative value, ε=0.2,0.0.6,0.8, puzzles me, esp. since the distance between the summary statistics is not scaled.) I however wonder why other approaches are impossible. (Or why it is necessary to use this distribution to model correlated betas. Unless I am confused copulas were invented to this effect.) First, this is a latent variable model, so latent variables could be introduced inside an MCMC scheme. A wee bit costly but feasible. Second, several moments of those distributions are known so a empirical likelihood approach could be considered.

beta HPD

Posted in Books, R, Statistics, Uncategorized, University life with tags , , , , , , , on October 17, 2013 by xi'an

While writing an introductory chapter on Bayesian analysis (in French), I came by the issue of computing an HPD region when the posterior distribution is a Beta B(α,β) distribution… There is no analytic solution and hence I resorted to numerical resolution (provided here for α=117.5, β=115.5):

f=function(p){

# find the symmetric
g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))}
return(uniroot(g,c(.504,.99))$root)} ff=function(alpha){ # find the coverage g=function(x){return(x-p*((1-p)/(1-x))^(115.5/117.5))} return(uniroot(g,c(.011,.49))$root)}


and got the following return:

> ff(.95)
[1] 0.4504879
> f(ff(.95))
[1] 0.5580267


which was enough for my simple book illustration… Since (.450,558) is then the HPD region at credible level 0.95.

A slice of infinity

Posted in R, Statistics, University life with tags , , , , , , , on July 28, 2011 by xi'an

Peng Yu sent me an email about the conditions for convergence of a Gibbs sampler:

The following statement mentions convergence. But I’m not familiar what the regularity condition is.

“But it is necessary to have a finite probability of moving away from the current state at all times in order to satisfy the regularity conditions on which the whole MCMC theory depends.”

Slice sampler is discussed in your book Monte Carlo Statistical Methods. I think that the “regularity condition” may have been discussed in your book. If so, would you please let me know where it is? Thanks and look forward to hearing from you!

The quote is from Martyn Plummer and deals with a stopping rule in JAGS implementation of the slice sampler. (The correct wording should be “strictly positive probability” rather than “finite probability”, I think.) However, this has nothing to do with a “regularity condition” on the irreducibility of a Markov chain: if a slice sampler is implemented for an unbounded density target, say a Beta(1/2,1/2), there is no irreducibility condition connected with the infiniteness of the density. In theory, (a) the chain never visits the “state” where the density is infinite (if only because we are dealing with a continuous state space) and (b) after visiting a value x with a large density f(x), the slice sampler allows for a move away from it since the slice involves a uniform simulation over (0,f(x)). Deeper properties of the slice sampler (like geometric ergodicity) are explored in, e.g., this JRSS B paper by Gareth Roberts and Jeff Rosenthal and this one in the Annals of Statistics by Radford Neal. In practice, the problem is caused by values of f(x) that cannot be computed and hence produce an error message like

Singularity in likelihood found by Slicer.

If those singularities can be localised, a neighbourhood excluding them should be introduced. (More easily said than done, obviously!)

Here is an example of a slice sampler with the Beta(1/2,1/2) distribution:

#graphics
dote=function(x,y) points(x,y,col="gold",pch=19,cex=.4)
mote=function(x,y,z,w) lines(c(x,z),c(y,w),col="gold",lwd=.5)
cst=dbeta(.5,.5,.5)*.5 #normalising constant
#inverting f(x)=d, 2nd degree equation
hitden=function(d) .5+.5*sqrt(1-4*( cst/ max(d,dbeta(.5,.5,.5)))^2)*c(-1,1)
#output
curve(dbeta(x,.5,.5),0,1,ylab="density",lwd=2,col="steelblue",n=1001)
x=runif(1);u=runif(1)*dbeta(x,.5,.5);dote(x,u)
for (t in 1:100){ #100 slice steps
bo=hitden(u)
nx=sample(c(runif(1,0,bo[1]),runif(1,bo[2],1)),1)
nu=runif(1)*dbeta(nx,.5,.5)
mote(x,u,nx,nu)
x=nx;u=nu;dote(x,u)
}


which clearly explores the whole area under the Beta(1/2,1/2) density. Even when started at a large density value like f(.999999), it eventually leaves the vicinity of this highly improbable value.