Archive for arXiv

ABC with privacy

Posted in Books, Statistics with tags , , , , , , , , on April 18, 2023 by xi'an


I very recently read a  2021 paper by Mijung Park, Margarita Vinaroz, and Wittawat Jitkrittum on running ABC while ensuring data privacy (published in Entropy).

“…adding noise to the distance computed on the real observations and pseudo-data suffices the privacy guarantee of the resulting  posterior samples”

For ABC tolerance, they use maximum mean discrepancy (MMD) and for privacy the standard if unconvincing notion of differential privacy, defined by ensuring an upper bound on the amount of variation in the probability ratio when replacing/removing/adding an observation. (But not clearly convincing users their data is secure.)

While I have no reservation about the validation of the double-noise approach, I find it surprising that noise must be (twice) added when vanilla ABC is already (i) noisy, since based on random pseudo-data, and (ii) producing only a sample from an approximate posterior instead of returning an exact posterior. My impression indeed was that ABC should be good enough by itself to achieve privacy protection. In the sense that the accepted parameter values were those that generated random samples sufficiently close to the actual data, hence not only compatible with the true data, but also producing artificial datasets that are close enough to the data. Presumably these artificial datasets should not be produced as the intersection of their ε neighbourhoods may prove enough to identify the actual data. (The proposed algorithm does return all generated datasets.) Instead the supported algorithm involves randomisation of both tolerance ε and distance ρ to the observed data (with the side issue that they may become negative since the noise is Laplace).

arX[g]iv[e]

Posted in Statistics with tags , on March 16, 2023 by xi'an

sampling using adaptive regenerative processes

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , on October 20, 2022 by xi'an

We just posted a new arXival on Sampling using Adaptive Regenerative Processes, written by Hector McKimm (Warwick), Andi Wang (soon Warwick), Murray Pollock (ex-Warwick), Gareth Roberts (Warwick) and myself. This is a collaborative that has been going on for a while, mostly via zoom in these Covid times. It builds upon the earlier paper of Wang et al.  (2021) constructing the regeneration process (Restore), by aiming at improving this process by adapting the regeneration distribution and hence dramatically reducing the number of regenerations. Gaining in addition the ability to sample from target distributions for which simulation under a fixed regeneration distribution is computationally intractable. This work is part of Hector’s PhD, written at Warwick.

important Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on July 21, 2022 by xi'an

With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al.  (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT.  While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.

day one at ISBA 22

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , on June 29, 2022 by xi'an

Started the day with a much appreciated swimming practice in the [alas warm⁺⁺⁺] outdoor 50m pool on the Island with no one but me in the slooow lane. And had my first ride with the biXi system, surprised at having to queue behind other bikes at red lights! More significantly, it was a great feeling to reunite at last with so many friends I had not met for more than two years!!!

My friend Adrian Raftery gave the very first plenary lecture on his work on the Bayesian approach to long-term population projections, which was recently  a work censored by some US States, then counter-censored by the Supreme Court [too busy to kill Roe v. Wade!]. Great to see the use of Bayesian methods validated by the UN Population Division [with at least one branch of the UN

Stephen Lauritzen returning to de Finetti notion of a model as something not real or true at all, back to exchangeability. Making me wonder when exchangeability is more than a convenient assumption leading to the Hewitt-Savage theorem. And sufficiency. I mean, without falling into a Keynesian fallacy, each point of the sample has unique specificities that cannot be taken into account in an exchangeable model. Nice to hear some measure theory, though!!! Plus a comment on the median never being sufficient, recouping an older (and presumably not original) point of mine. Stephen’s (or Fisher’s?) argument being that the median cannot be recursively computed!

Antonietta Mira and I had our ABC session this afternoon with Cecilia Viscardi, Sirio Legramanti, and Massimiliano Tamborino (Warwick) as speakers. Cecilia linked ABC with normalising flows, in collaboration with Dennis Prangle (whose earlier paper on this connection was presented as the first One World ABC seminar). Thus using past simulations to approximate the posterior by a neural network, possibly with a significant increase in computing time when compared with more rudimentary SMC-ABC methods in larger dimensions. Sirio considered summary-free ABC based on discrepancies like Rademacher complexity. Which more or less contains MMD, Kullback-Leibler, Wasserstein and more, although it seems to be dependent on the parameterisation of the observations. An interesting opening at the end was that this approach could apply to non iid settings. Massi presented a paper coauthored with Umberto that had just been arXived. On sequential ABC with a dependence on the summary statistic (hence guided). Further bringing copulas into the game, although this forces another choice [for the marginals] in the method.

Tamara Broderick talked about a puzzling leverage effect of some observations in economic studies where a tiny portion of individuals may modify the significance or the sign of a coefficient, for which I cannot tell whether the data or the reliance on statistical significance are to blame. Robert Kohn presented mixture-of-Gaussian copulas [not to be confused with mixture of Gaussian-copulas!] and Nancy Reid concluded my first [and somewhat exhausting!] day at ISBA with a BFF talk on the different statistical paradigms take on confidence (for which the notion of calibration seems to remain frequentist).

Side comments: First, most people in the conference are wearing masks, which is great! Also, I find it hard to read slides from the screen, which I presume is an age issue (?!) Even more aside, I had Korean lunch in a place that refused to serve me a glass of water, which I find amazing.

%d bloggers like this: