Archive for EM algorithm

inference with insufficient statistics #2

Posted in Books, Kids, Statistics with tags , , , , , , , on December 28, 2023 by xi'an

Another X validated question of some interest: how to infer about the parameters of a model when only given a fraction 1-α of the order statistics. For instance, the (1-α)n largest observations. On a primary level, the answer is somewhat obvious since the joint density of these observations is available in closed form. At another level, it brings out the fact that the distribution of the unobserved part of the sample given the observed one only depends on the smallest observed order statistic ς (which is thus sufficient in that sense) and ends up being the original distribution truncated at ς, which allows for a closed form EM implementation. Which is also interesting given that the moments of a Normal order statistic are not available in closed form. This reminded me of the insufficient Gibbs paper we wrote with Antoine and Robin a few months ago, except for the available likelihood. And provided fodder for the final exam of my introductory mathematical statistics course at Paris Dauphine.

approximate computation for exact statistical inference from differentially private data

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 10, 2023 by xi'an

“the employment of ABC for differentially private data serendipitously eradicates the “approximate” nature of the resulting posterior  samples, which otherwise would be the case if the data were noise-free.”

In parallel or conjunction with the 23w5601 workshop, I was reading some privacy literature and came across this Exact inference with approximate computation for differentially private data via perturbation by Ruobin Gong  that appeared in the Journal of Privacy and Confidentiality last year (2022). When differential privacy is implemented by perturbation, i.e. by replacing the private data with a randomised, usually Gaussian, version, the exact posterior distribution is a convolution which, if unavailable, can be approximated by a standard ABC step. Which, most interestingly does not impact the accuracy of the (public) posterior, i.e. it does not modify this posterior when the probability of acceptance in ABC is the density of the perturbation noise at the public data given the pseudo-data. Which follows from the 1984 introduction of the ABC idea. On the opposite, EM does not enjoy an exact version, as the E step must be (unbiasedly) approximated by a Monte Carlo representation that relies on the same ABC algorithm. Which usually makes the M step harder, although a Monte Carlo version of the gradient is also available.

dynamic mixtures and frequentist ABC

Posted in Statistics with tags , , , , , , , , , , , , , , , on November 30, 2022 by xi'an

This early morning in NYC, I spotted this new arXival by Marco Bee (whom I know from the time he was writing his PhD with my late friend Bernhard Flury) and found he has been working for a while on ABC related problems. The mixture model he considers therein is a form of mixture of experts, where the weights of the mixture components are not constant but functions on (0,1) of the entry as well. This model was introduced by Frigessi, Haug and Rue in 2002 and is often used as a benchmark for ABC methods, since it is missing its normalising constant as in e.g.

f(x) \propto p(x) f_1(x) + (1-p(x)) f_2(x)

even with all entries being standard pdfs and cdfs. Rather than using a (costly) numerical approximation of the “constant” (as a function of all unknown parameters involved), Marco follows the approximate maximum likelihood approach of my Warwick colleagues, Javier Rubio [now at UCL] and Adam Johansen. It is based on the [SAME] remark that under a uniform prior and using an approximation to the actual likelihood the MAP estimator is also the MLE for that approximation. The approximation is ABC-esque in that a pseudo-sample is generated from the true model (attached to a simulation of the parameter) and the pair is accepted if the pseudo-sample stands close enough to the observed sample. The paper proposes to use the Cramér-von Mises distance, which only involves ranks. Given this “posterior” sample, an approximation of the posterior density is constructed and then numerically optimised. From a frequentist view point, a direct estimate of the mode would be preferable. From my Bayesian perspective, this sounds like a step backwards, given that once a posterior sample is available, reconnecting with an approximate MLE does not sound highly compelling.

observed vs. complete in EM algorithm

Posted in Statistics with tags , , , , , on November 17, 2022 by xi'an

While answering a question related with the EM  algorithm on X validated, I realised a global (or generic) feature of the (objective) E function, namely that

E(\theta'|\theta)=\mathbb E_{\theta}[\log\,f_{X,Z}(x^\text{obs},Z|\theta')|X=x^\text{obs}]

can always be written as

\log\,f_X(x^\text{obs};\theta')+\mathbb E_{\theta}[\log\,f_{Z|X}(Z|x^\text{obs},\theta')|X=x^\text{obs}]

therefore always includes the (log-) observed likelihood, at least in this formal representation. While the proof that EM is monotonous in the values of the observed likelihood uses this decomposition as well, in that

\log\,f_X(x^\text{obs};\theta')=\log\,\mathbb E_{\theta}\left[\frac{f_{X,Z}(x^\text{obs},Z;\theta')}{f_{Z|X}(Z|x^\text{obs},\theta)}\big|X=x^\text{obs}\right]

I wonder if the appearance of the actual target in the temporary target E(θ’|θ) can be exploited any further.

EM rocks!

Posted in Statistics with tags , , , , , on October 8, 2021 by xi'an

A rare occurrence of a statistics paper in Nature!, well Nature Scientific Reports, where the authors, Jaya Prakesh, Umang Agarwal and Phaneendra K. Yalavarthy, describe using a parallel implementation of the EM algorithm, for an image reconstruction in rock tomography. Due to a 1,887,436,800 x 1,887,436,800 matrix involved in the original 3D model.