Archive for cross validated

unbalanced sampling

Posted in pictures, R, Statistics with tags , , , , , , , on May 17, 2021 by xi'an


A question from X validated on sampling from an unknown density f when given both a sample from the density f restricted to a (known) interval A , say, and a sample from f restricted to the complement of A, say. Or at least on producing an estimate of the mass of A under f, p(A)

The problem sounds impossible to solve without an ability to compute the density value at a given value, since  any convex combination αf¹+(1-α)f² would return the same two samples. Assuming continuity of the density f at the boundary point a between A and its complement, a desperate solution for p(A)/1-p(A) is to take the ratio of the density estimates at the value a, which turns out not so poor an approximation if seemingly biased. This was surprising to me as kernel density estimates are notoriously bad at boundary points.

If f(x) can be computed [up to a constant] at an arbitrary x, it is obviously feasible to simulate from f and approximate p(A). But the problem is then moot as a resolution would not even need the initial samples. If exploiting those to construct a single kernel density estimate, this estimate can be used as a proposal in an MCMC algorithm. Surprisingly (?), using instead the empirical cdf as proposal does not work.

warped Cauchys

Posted in Books, Kids, R, Statistics with tags , , , on May 4, 2021 by xi'an

A somewhat surprising request on X validated about the inverse cdf representation of a wrapped Cauchy distribution. I had not come across this distribution, but its density being

f_{WC}(\theta;\gamma)=\sum_{n=-\infty}^\infty \frac{\gamma}{\pi(\gamma^2+(\theta+2\pi n)^2)}\mathbb I_{ -\pi<\theta<\pi}

means that it is the superposition of shifted Cauchys on the unit circle (with nice complex representations). As such, it is easily simulated by re-shifting a Cauchy back to (-π,π), i.e. using the inverse transform

\theta = [\gamma\tan(\pi U-\pi/2)+\pi]\ \text{mod}\,(2\pi) - \pi

a common confusion between sample and population moments

Posted in Books, Kids, R, Statistics with tags , , , , , , , on April 29, 2021 by xi'an

simulating Maxwell distribution

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on April 22, 2021 by xi'an

A question that came out on X validated a few days ago is how to efficiently simulate from a distribution with density

x²φ(x).

(Obviously this density is already properly normalised since the second moment of the standard Normal  distribution is one.) The first solution that came out (by Jarle Tufto) exploits the fact that this density corresponds to a signed root of a χ²(3) variate. This is a very efficient proposal that requires a Gamma sampler and a random sign sampler. Since the cdf is available in closed form,

Φ(x)-xφ(x),

I ran a comparison with a numerical inversion, but this is much slower. I also tried an accept-reject version based on a Normal proposal with a larger variance, but even when optimising this variance, the running time was about twice as large. While checking Devroye (1986) for any possible if unlikely trick, I came upon this distribution twice (p.119 in an unsolved exercise, p.176 presented as the Maxwell distribution). With the remark that, if

X~x²φ(x),  then  Y=UX~φ(x).

Inverting this result leads to X being distributed as

sign(Y)√(Y²-2log(U)),

which recovers the original χ²(3) solution, if slightly (and mysteriously) increasing the simulation speed.

ratio of Gaussians

Posted in Books, Statistics, University life with tags , , , , , , , , on April 12, 2021 by xi'an

Following (as usual) an X validated question, I came across two papers of George Marsaglia on the ratio of two arbitrary (i.e. unnormalised and possibly correlated) Normal variates. One was a 1965 JASA paper,

where the density of the ratio X/Y is exhibited, based on the fact that this random variable can always be represented as (a+ε)/(b+ξ) where ε,ξ are iid N(0,1) and a,b are constant. Surprisingly (?), this representation was challenged in a 1969 paper by David Hinkley (corrected in 1970).

And less surprisingly the ratio distribution behaves almost like a Cauchy, since its density is

meaning it is a two-component mixture of a Cauchy distribution, with weight exp(-a²/2-b²/2), and of an altogether more complex distribution ƒ². This is remarked by Marsaglia in the second 2006 paper, although the description of the second component remains vague, besides a possible bimodality. (It could have a mean, actually.) The density ƒ² however resembles (at least graphically) the generalised Normal inverse density I played with, eons ago.