## conditioning an algorithm

Posted in Statistics with tags , , , , , , , , , , , on June 25, 2021 by xi'an

A question of interest on X validated: given a (possibly black-box) algorithm simulating from a joint distribution with density [wrt a continuous measure] p(z,y) (how) is it possible to simulate from the conditional p(y|z⁰)? Which reminded me of a recent paper by Lindqvist et al. on conditional Monte Carlo. Which zooms on the simulation of a sample X given the value of a sufficient statistic, T(X)=t, revolving about pivotal quantities and inversions à la fiducial statistics, following an earlier Biometrika paper by Lindqvist & Taraldsen, in 2005. The idea is to write

$X=\chi(U,\theta)\qquad T(X)=\tau(U,\theta)$

where U has a distribution that depends on θ, to solve τ(u,θ)=t in θ for a given pair (u,t) with solution θ(u,t) and to generate u conditional on this solution. But this requires getting “under the hood” of the algorithm to such an extent as not answering the original question, or being open to other solutions using the expression for the joint density p(z,y)… In a purely black box situation, ABC appears as the natural if approximate solution.

## scale matters [maths as well]

Posted in pictures, R, Statistics with tags , , , , , , , , on June 2, 2021 by xi'an

A question from X validated on why an independent Metropolis sampler of a three component Normal mixture based on a single Normal proposal was failing to recover the said mixture…

When looking at the OP’s R code, I did not notice anything amiss at first glance (I was about to drive back from Annecy, hence did not look too closely) and reran the attached code with a larger variance in the proposal, which returned the above picture for the MCMC sample, close enough (?) to the target. Later, from home, I checked the code further and noticed that the Metropolis ratio was only using the ratio of the targets. Dividing by the ratio of the proposals made a significant (?) to the representation of the target.

More interestingly, the OP was fundamentally confused between independent and random-walk Rosenbluth algorithms, from using the wrong ratio to aiming at the wrong scale factor and average acceptance ratio, and furthermore challenged by the very notion of Hessian matrix, which is often suggested as a default scale.

## unbalanced sampling

Posted in pictures, R, Statistics with tags , , , , , , , on May 17, 2021 by xi'an

A question from X validated on sampling from an unknown density f when given both a sample from the density f restricted to a (known) interval A , say, and a sample from f restricted to the complement of A, say. Or at least on producing an estimate of the mass of A under f, p(A)

The problem sounds impossible to solve without an ability to compute the density value at a given value, since  any convex combination αf¹+(1-α)f² would return the same two samples. Assuming continuity of the density f at the boundary point a between A and its complement, a desperate solution for p(A)/1-p(A) is to take the ratio of the density estimates at the value a, which turns out not so poor an approximation if seemingly biased. This was surprising to me as kernel density estimates are notoriously bad at boundary points.

If f(x) can be computed [up to a constant] at an arbitrary x, it is obviously feasible to simulate from f and approximate p(A). But the problem is then moot as a resolution would not even need the initial samples. If exploiting those to construct a single kernel density estimate, this estimate can be used as a proposal in an MCMC algorithm. Surprisingly (?), using instead the empirical cdf as proposal does not work.

## simulating Maxwell distribution

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on April 22, 2021 by xi'an

A question that came out on X validated a few days ago is how to efficiently simulate from a distribution with density

x²φ(x).

(Obviously this density is already properly normalised since the second moment of the standard Normal  distribution is one.) The first solution that came out (by Jarle Tufto) exploits the fact that this density corresponds to a signed root of a χ²(3) variate. This is a very efficient proposal that requires a Gamma sampler and a random sign sampler. Since the cdf is available in closed form,

Φ(x)-xφ(x),

I ran a comparison with a numerical inversion, but this is much slower. I also tried an accept-reject version based on a Normal proposal with a larger variance, but even when optimising this variance, the running time was about twice as large. While checking Devroye (1986) for any possible if unlikely trick, I came upon this distribution twice (p.119 in an unsolved exercise, p.176 presented as the Maxwell distribution). With the remark that, if

X~x²φ(x),  then  Y=UX~φ(x).

Inverting this result leads to X being distributed as

sign(Y)√(Y²-2log(U)),

which recovers the original χ²(3) solution, if slightly (and mysteriously) increasing the simulation speed.

## nested sampling: any prior anytime?!

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , on March 26, 2021 by xi'an

A recent arXival by Justin Alsing and Will Handley on “nested sampling with any prior you like” caught my attention. If only because I was under the impression that some priors would not agree with nested sampling. Especially those putting positive weight on some fixed levels of the likelihood function, as well as improper priors.

“…nested sampling has largely only been practical for a somewhat restrictive class of priors, which have a readily available representation as a transform from the unit hyper-cube.”

Reading from the paper, it seems that the whole point is to demonstrate that “any proper prior may be transformed onto the unit hypercube via a bijective transformation.” Which seems rather straightforward if the transform is not otherwise constrained: use a logit transform in every direction. The paper gets instead into the rather fashionable direction of normalising flows as density representations. (Which suddenly reminded me of the PhD dissertation of Rob Cornish at Oxford, which I examined last year. Even though nested was not used there in the same understanding.) The purpose appearing later (in the paper) or in fine to express a random variable simulated from the prior as the (generative) transform of a Uniform variate, f(U). Resuscitating the simulation from an arbitrary distribution from first principles.

“One particularly common scenario where this arises is when one wants to use the (sampled) posterior from one experiment as the prior for another”

But I remained uncertain at the requirement for this representation in implementing nested sampling as I do not see how it helps in bypassing the hurdles of simulating from the prior constrained by increasing levels of the likelihood function. It would be helpful to construct normalising flows adapted to the truncated priors but I did not see anything related to this version in the paper.

The cosmological application therein deals with the incorporation of recent measurements in the study of the ΛCDM cosmological model, that is, more recent that the CMB Planck dataset we played with 15 years ago. (Time flies, even if an expanding Universe!) Namely, the Baryon Oscillation Spectroscopic Survey and the SH0ES collaboration.