## unbalanced sampling

**A** question from X validated on sampling from an unknown density *f* when given both a sample from the density *f* restricted to a (known) interval *A* , *f¹* say, and a sample from *f* restricted to the complement of *A,* *f²* say. Or at least on producing an estimate of the mass of *A* under *f, p(A)*…

The problem sounds impossible to solve without an ability to compute the density value at a given value, since any convex combination *αf¹+(1-α)f²* would return the same two samples. Assuming continuity of the density *f* at the boundary point *a* between *A* and its complement, a desperate solution for *p(A)/1-p(A)* is to take the ratio of the density estimates at the value *a*, which turns out not so poor an approximation if seemingly biased. This was surprising to me as kernel density estimates are notoriously bad at boundary points.

If *f(x)* can be computed [up to a constant] at an arbitrary *x*, it is obviously feasible to simulate from *f* and approximate *p(A)*. But the problem is then moot as a resolution would not even need the initial samples. If exploiting those to construct a single kernel density estimate, this estimate can be used as a proposal in an MCMC algorithm. Surprisingly (?), using instead the empirical cdf as proposal does not work.

May 17, 2021 at 8:13 am

[…] article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) […]