Archive for density estimator

manifold learning [BNP Seminar, 11/01/23]

Posted in Books, Statistics, University life with tags , , , , , , , , on January 9, 2023 by xi'an

An incoming BNP webinar on Zoom by Judith Rousseau and Paul Rosa (U of Oxford), on 11 January at 1700 Greenwich time:

Bayesian nonparametric manifold learning

In high dimensions it is common to assume that the data have a lower dimensional structure. We consider two types of low dimensional structure: in the first part the data is assumed to be concentrated near an unknown low dimensional manifold, in the second case it is assumed to be possibly concentrated on an unknown manifold. In both cases neither the manifold nor the density is known. Atypical example is for noisy observations on an unknown low dimensional manifold.

We first consider a family of Bayesian nonparametric density estimators based on location – scale Gaussian mixture priors and we study the asymptotic properties of the posterior distribution. Our work shows in particular that non conjuguate location-scale Gaussian mixture models can adapt to complex geometries and spatially varying regularity when the density is supported near a low dimensional manifold.

In the second part of the talk we will consider also the case where the distribution is supported on a low dimensional manifold. In this non dominated model,we study different types of posterior contraction rates: Wasserstein and L_1(\mu_\mathcal{M}) where \mu_\mathcal{M} is the Haussdorff measure on the manifold \mathcal{M} supporting the density. Some more generic results on Wasserstein contraction rates are also discussed.


integral theorems for Monte Carlo

Posted in Books, pictures, Statistics with tags , , , , , , , on August 12, 2021 by xi'an

Nhat Ho and Stephen G. Walker have just arXived a paper on the use of (Fourier) integral theorems for Monte Carlo estimators, following the earlier entry of Parzen: namely that for any integrable function,

m(y)=\frac{1}{(2\pi)^d}\int_{\mathbb R^d}\int_{\mathbb R^d}\cos(s^\text{T}(y-x))m(x)\text dx\text ds

which can be turned into an estimator of a density m based on a sample from m. This identity can be rewritten as

m(y)=\lim_{R\to\infty}\frac{1}{\pi^d}\int_{\mathbb R^d}\prod_{i=1}^d\dfrac{\sin(R(y_i-x_i))}{y_i-x_i}\;m(x)\,\text dx

and the paper generalises this identity to all cyclic functions. Even though it establishes that sin is the optimal choice. After reading this neat result, I however remain uncertain on how this could help with Monte Carlo integration.

visualising bias and unbiasedness

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , on April 29, 2019 by xi'an

A question on X validated led me to wonder at the point made by Christopher Bishop in his Pattern Recognition and Machine Learning book about the MLE of the Normal variance being biased. As it is illustrated by the above graph that opposes the true and green distribution of the data (made of two points) against the estimated and red distribution. While it is true that the MLE under-estimates the variance on average, the pictures are cartoonist caricatures in their deviance permanence across three replicas. When looking at 10⁵ replicas, rather than three, and at samples of size 10, rather than 2, the distinction between using the MLE (left) and the unbiased estimator of σ² (right).

When looking more specifically at the case n=2, the humongous variability of the density estimate completely dwarfs the bias issue:

Even when averaging over all 10⁵ replications, the difference is hard to spot (and both estimations are more dispersed than the truth!):

%d bloggers like this: