Archive for Annals of Statistics

robust privacy

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on May 14, 2024 by xi'an

During a recent working session, some Oceanerc (incl. me) went reading Privacy-Preserving Parametric Inference: A Case for Robust Statistics by Marco Avella-Medina (JASA, 2022), where robust criteria are advanced as efficient statistical tools in private settings. In this paper, robustness means using M-estimators T—as function of the empirical cdf—with basis score functions Ψ, defined as

\sum_{i=1}^n\Psi(x_i,T(\hat F_n))=0,

where Ψ is bounded. A construction further requiring that one can assess the sensitivity (in Dwork et al, 2006, sense) of a queried function, sensitivity itself linked with a measure of differential privacy. Because standard robustness approaches à la Huber allow for a portion of the sample to issue from an outlying (arbitrary) distribution, as in ε-contaminations, it makes perfect sense that robustness emerges within the differential framework. However, this common sense perception does not seem good enough for achieving differential privacy and the paper introduces a further randomization with noise scaled by (n,ε,δ) in the following way

T(\hat F_n)+\gamma(T,\hat F_n)5\sqrt{2\log(n)\log(2/\delta)/\epsilon_n}Z

that also applies to test statistics. This scaling seems to constitute the central result of the paper, which establishes asymptotically validity in the sense of statistical consistency (with the sample size n). But I am left wondering whether this outcome counts as supporting differential privacy as a sensible notion…

“…our proofs for the convergence of noisy gradient descent and noisy Newton’s method rely on showing that with high probability, the noise introduced to the gradients and Hessians has a negligible effect on the convergence of the iterates (up to the order of the statistical error of the non-noisy versions of the algorithms).” Avella-Medina, Bradshaw, & Loh

As a sequel I then read a more recent publication of Avella-Medina, Differentially private inference via noisy optimization, written with Casey Bradshaw & Po-Ling Loh, which appeared in the Annals of Statistics (2023). Again considering privatised estimation and inference for M-estimators, obtained by using noisy optimization procedures (noisy gradient descent, noisy Newton’s method) and constructing noisy confidence regions, that output differentially private avatars of standard M-estimators. Here the noisification goes through a randomisation of the gradient step like

\theta^{(k+1)}=\theta^{(k)}-\frac{\eta}{n}\sum_i\Psi(x_i,\theta^{(k)})+\frac{\eta B\sqrt K}{n}Z_k

where B is an upper bound on the gradient Ψ, η is a discretization step, and K is the total number of iterations (thus fixed in advance). The above stochastic gradient sequence converges with high probability to the actual M-estimator in n and not in K, since the upper bound on the distance scales in √K/n. Where does the attached privacy guarantee come from? It proceeds by an argument of a composition of a sequence of differentially private outputs, all based on the same dataset.

“…the larger the number [K] of data (gradient) queries of the algorithm, the more prone it will be to privacy leakage.”

The Newton method version is a variation on the above stochastic gradient descent. Except it seems to converge faster, as illustrated above.

Bayes’s theorem for improper mixtures

Posted in Books, Statistics, University life with tags , , , , , , on July 19, 2023 by xi'an

While looking for references for a Master summer project at Warwick on Bayesian inference on the Cauchy location parameter, I came across a 2011 Annals of Statistics paper by Peter McCullagh and Han Han.  Which expands the Bayesian framework to the improper case by considering a Poisson process over the parameter set with mean measure ν the improper prior. Instead of a single random parameter, this construct returns a countable collection of pairs (θ,y), while the observations induce a subset of that collection constrained by y∈A, a “sampling region” both capital to the derivation of the joint distribution and obscure in that A remains unspecified (but such that 0<ν(A)<∞ and conveniently returning the observed sample of y’s).

“Provided that the key finiteness condition is satisfied, this probabilistic analysis of the extended model may be interpreted as a vindication of improper Bayes procedures derived from the original model.”

“Thus, the existence of a joint probability model associated with an improper prior does not imply optimality in the form of coherence, consistency or admissibility.”

This is definitely fascinating!, even though I have troubles linking this infinite sequence of θ‘s with regular Bayesian inference, since the examples in the paper seem to revert to a single parameter value, as in §4.1, for the Normal model and §5 for the Cauchy model. The authors also revisit the marginalisation paradoxes of Dawid, Stone and Zidek (1973), with the argument that the improper measure leading to the paradox is not compatible with ν(A)<∞, hence does not define a natural conditional, while the “other” improper measure avoids the paradox.

statistical analysis of GANs

Posted in Books, Statistics with tags , , , , , , , , on May 24, 2021 by xi'an

My friend Gérard Biau and his coauthors have published a paper in the Annals of Statistics last year on the theoretical [statistical] analysis of GANs, which I had missed and recently read with a definitive interest in the issues. (With no image example!)

If the discriminator is unrestricted the unique optimal solution is the Bayes posterior probability

\dfrac{p^\star(x)}{p^\star(x)+p_\theta(x)}

when the model density is everywhere positive. And the optimal parameter θ corresponds to the closest model in terms of Kullback-Leibler divergence. The pseudo-true value of the parameter. This is however the ideal situation, while in practice D is restricted to a parametric family. In this case, if the family is wide enough to approximate the ideal discriminator in the sup norm, with error of order ε, and if the parameter space Θ is compact, the optimal parameter found under the restricted family approximates the pseudo-true value in the sense of the GAN loss, at the order ε². With a stronger assumption on the family ability to approximate any discriminator, the same property holds for the empirical version (and in expectation). (As an aside, the figure illustrating this property confusedly uses an histogramesque rectangle to indicate the expectation of the discriminator loss!) And both parameter (θ and α) estimators converge to the optimal ones with the sample size. An interesting foray from statisticians in a method whose statistical properties are rarely if ever investigated. Missing a comparison with alternative approaches, like MLE, though.

linearity, reversed

Posted in Books, Kids with tags , , , , , on September 19, 2020 by xi'an

While answering a question on X validated on the posterior mean being a weighted sum of the prior mean and of the maximum likelihood estimator, when the weights do not depend on the data, which is true in conjugate natural exponential family settings, I re-read this wonderful 1979 paper of Diaconis & Ylvisaker establishing the converse, namely that when the linear combination holds, the prior need be conjugate! This holds within exponential families, but I cannot think of a reasonable case outside exponential families where the linearity holds (again with constant weights, as otherwise it always holds in dimension one, albeit with weights possibly outside [0,1]).

Xmas tree at UCL, with a special gift

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 26, 2019 by xi'an

Ph.D. students at UCL Statistics have made this Xmas tree out of bound and unbound volumes of statistics journals, not too hard to spot (especially the Current Indexes which I abandoned when I left my INSEE office a few years ago). An invisible present under the tree is the opening of several positions, namely two permanent lectureships and two three-year research fellowships, all in Statistics or Applied Probability, with the fellowship deadline being the 1st of December 2019!