Archive for Statistics and Computing

Exact MCMC with differentially private moves

Posted in Statistics with tags , , , , , , , on September 25, 2023 by xi'an

“The algorithm can be made differentially private while remaining exact in the sense that its target distribution is the true posterior distribution conditioned on the private data (…) The main contribution of this paper arises from the simple  observation that the penalty algorithm has a built-in noise in its calculations which is not desirable in any other context but can be exploited for data privacy.”

Another privacy paper by Yldirim and Ermis (in Statistics and Computing, 2019) on how MCMC can ensure privacy. For free. The original penalty algorithm of Ceperley and Dewing (1999) is a form of Metropolis-Hastings algorithm where the Metropolis-Hastings acceptance probability is replaced with an unbiased estimate (e.g., there exists an unbiased and Normal estimate of the log-acceptance ratio, λ(θ, θ’), whose exponential can be corrected to remain unbiased).  In that case, the algorithm remains exact.

“Adding noise to λ(θ, θ) may help with preserving some sort of data privacy in a Bayesian framework where [the posterior], hence λ(θ, θ), depends on the data.”

Rather than being forced into replacing the Metropolis-Hastings acceptance probability with an unbiased estimate as in pseudo-marginal MCMC, the trick here is in replacing λ(θ, θ’) with a Normal perturbation, hence preserving both the target (as shown by Ceperley and Dewing (1999)) and the data privacy, by returning a noisy likelihood ratio. Then, assuming that the difference sensitivity function for the log-likelihood [the maximum difference c(θ, θ’) over pairs of observations of the difference between log-likelihoods at two arbitrary parameter values θ and θ’] is decreasing as a power of the sample size n, the penalty algorithm is differentially private, provided the variance is large enough (in connection with c(θ, θ’)] after a certain number of MCMC iterations. Yldirim and Ermis (2019) show that the setting covers the case of distributed, private, data. even though the efficiency decreases with the number of (protected) data silos. (Another drawback is that the data owners must keep exchanging likelihood ratio estimates.

 

a message from the Editor of Statistics & Computing

Posted in Books, Statistics, University life with tags , , , , , on July 15, 2022 by xi'an

[This is a message from Ajay Jasra, new Editor in Chief of Statistics & Computing, regarding submissions (and another stone in Springer’s garden).]

Subject: New Submissions at Statistics and Computing

Dear Prospective Authors,

As you may be aware Springer has introduced a new system for the management of article submissions. Despite my best efforts, there are several missing functionalities which make efficient management of article submissions virtually impossible. We do expect the system to be fixed by the new year, but that does not help us in the short-term.

I would please request all new submissions, until further notice, to be made on the old editorial manager:

https://www.editorialmanager.com/stco/default1.aspx

so that we can properly handle your manuscript.

Kind Regards,

Ajay Jasra
EIC Statistics & Computing

MCMC, with common misunderstandings

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , on January 27, 2020 by xi'an

As I was asked to write a chapter on MCMC methods for an incoming Handbook of Computational Statistics and Data Science, published by Wiley, rather than cautiously declining!, I decided to recycle the answers I wrote on X validated to what I considered to be the most characteristic misunderstandings about MCMC and other computing methods, using as background the introduction produced by Wu Changye in his PhD thesis. Waiting for the opinion of the editors of the Handbook on this Q&A style. The outcome is certainly lighter than other recent surveys like the one we wrote with Peter Green, Krys Latuszinski, and Marcelo Pereyra, for Statistics and Computing, or the one with Victor Elvira, Nick Tawn, and Changye Wu.

Springer no more!

Posted in Books, Kids, Statistics, University life with tags , , , , , on April 4, 2018 by xi'an

Just learned that, starting from tomorrow night, I will not have access to any of the Springer journals, as the negotiations between the consortium of French universities, research institutes, higher educations schools, and museums, failed. The commercial published refusing to stem the ever increasing fees, while happily taking in the fast increasing open access fees it pressures from authors, a unique example of triple taxation (researchers’ salaries, open access duties, and enormous non-negotiable subscription rates for the whole package of journals)… Following their German counterparts. Well, this is an opportunity for the boards of all these journals to withdraw and create the phantom version of their formal journal, evaluating and reviewing papers already available on arXiv! And I should definitely get my acts together, rise from my winter-is-coming lethargy, and launch PCI Comput Stat now!!!

parameter space for mixture models

Posted in Statistics, University life with tags , , , on March 24, 2017 by xi'an

“The paper defines a new solution to the problem of defining a suitable parameter space for mixture models.”

When I received the table of contents of the incoming Statistics & Computing and saw a paper by V. Maroufy and P. Marriott about the above, I was quite excited about a new approach to mixture parameterisation. Especially after our recent reposting of the weakly informative reparameterisation paper. Alas, after reading the paper, I fail to see the (statistical) point of the whole exercise.

Starting from the basic fact that mixtures face many identifiability issues, not only invariance by component permutation, but the possibility to add spurious components as well, the authors move to an entirely different galaxy by defining mixtures of so-called local mixtures. Developed by one of the authors. The notion is just incomprehensible for me: the object is a weighted sum of the basic component of the original mixture, e.g., a Normal density, and of k of its derivatives wrt its mean, a sort of parameterised Taylor expansion. Which implies the parameter is unidimensional, incidentally. The weights of this strange mixture are furthermore constrained by the positivity of the resulting mixture, a constraint that seems impossible to satisfy in the Normal case when the number of derivatives is odd. And hard to analyse in any case since possibly negative components do not enjoy an interpretation as a probability density. In exponential families, the local mixture is the original exponential family density multiplied by a polynomial. The current paper moves one step further [from the reasonable] by considering mixtures [in the standard sense] of such objects. Which components are parameterised by their mean parameter and a collection of weights. The authors then restrict the mean parameters to belong to a finite and fixed set, which elements are coerced by a maximum error rate on any compound distribution derived from this exponential family structure. The remainder of the paper discusses of the choice of the mean parameters and of an EM algorithm to estimate the parameters, with a confusing lower bound on the mixture weights that impacts the estimation of the weights. And no mention made of the positivity constraint. I remain completely bemused by the paper and its purpose: I do not even fathom how this qualifies as a mixture.