## unbalanced sampling

Posted in pictures, R, Statistics with tags , , , , , , , on May 17, 2021 by xi'an A question from X validated on sampling from an unknown density f when given both a sample from the density f restricted to a (known) interval A , say, and a sample from f restricted to the complement of A, say. Or at least on producing an estimate of the mass of A under f, p(A)

The problem sounds impossible to solve without an ability to compute the density value at a given value, since  any convex combination αf¹+(1-α)f² would return the same two samples. Assuming continuity of the density f at the boundary point a between A and its complement, a desperate solution for p(A)/1-p(A) is to take the ratio of the density estimates at the value a, which turns out not so poor an approximation if seemingly biased. This was surprising to me as kernel density estimates are notoriously bad at boundary points. If f(x) can be computed [up to a constant] at an arbitrary x, it is obviously feasible to simulate from f and approximate p(A). But the problem is then moot as a resolution would not even need the initial samples. If exploiting those to construct a single kernel density estimate, this estimate can be used as a proposal in an MCMC algorithm. Surprisingly (?), using instead the empirical cdf as proposal does not work.

## approximate lasso

Posted in pictures, R, Statistics with tags , , , on October 2, 2016 by xi'an Here is a representation of the precision of a kernel density estimate (second axis) against the true value of the density (first axis), which looks like a lasso of sorts, hence the title. I am not sure this tells much, except that the estimated values are close to the true values and that a given value of f(x) is associated with two different estimates, predictably…

## miXed distributions

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on November 3, 2015 by xi'an A couple of questions on X validated showed the difficulty students have with mixed measures and their density. Actually, my students always react with incredulity to the likelihood of a censored normal sample or to the derivation of a Bayes factor associated with the null (and atomic) hypothesis μ=0…

I attribute this difficulty to a poor understanding of the notion of density and hence to a deficiency in the training in measure theory, since the density f of the distribution F is always relative to a reference measure dμ, i.e.

f(x) = dF/dμ(x)

(Hence Lebesgue’s moustache on the attached poster!) To handle atoms in the distribution requires introducing a dominating measure dμ with atomic components, i.e., usually a sum of the Lebesgue measure and of the counting measure on the appropriate set. Which is not so absolutely obvious: while the first question had {0,1} as atoms, the second question introduced atoms on {-θ,θ}and required a change of variable to consider a counting measure on {-1,1}. I found this second question actually of genuine interest and a great toy example for class and exams.

## density()

Posted in R, Statistics with tags , , , , on June 28, 2011 by xi'an Following my earlier posts on the revision of Lack of confidence, here is an interesting outcome from the derivation of the exact marginal likelihood in the Laplace case. Computing the posterior probability of a normal model versus a Laplace model in the normal (gold) and the Laplace (chocolate) settings leads to the above histogram(s), which show(s) that the Bayesian solution is discriminating (in a frequentist sense), even for 21 observations. If instead I use R density() over the posterior probabilities, I get this weird and unmotivated flat density in the Laplace case. It looked as if the (frequentist) density of the posterior probability under the alternative was uniform, although there is no reason for this phenomenon!