Archive for differential privacy

Privacy-preserving Computing [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 13, 2024 by xi'an

Privacy-preserving Computing for Big Data Analytics and AI, by Kai Chen and Qiang Yang, is a rather short 2024 CUP book translated from the 2022 Chinese version (by the authors).  It covers secret sharing, homomorphic encryption, oblivious transfer, garbled circuit, differential privacy, trusted execution environment, federated learning, privacy-preserving computing platforms, and case studies. The style is survey-like, meaning it often is too light for my liking, with too many lists of versions and extensions, and more importantly lacking in detail to rely (solely) on it for a course. At several times standing closer to a Wikipedia level introduction to a topic. For instance, the chapter on homomorphic encryption [Chap.5] does not connect with the (presumably narrow) picture I have of this method. And the chapter on differential privacy [Chap.6] does not get much further than Laplace and Gaussian randomization, as in eg the stochastic gradient perturbation of Abadi et al. (2016) the privacy requirement is hardly discussed. The chapter on federated leaning [Chap.8] is longer if not much more detailed, being based on a entire book on Federated learning whose Qiang Yang is the primary author. (With all figures in that chapter being reproduced from said book.)  The next chapter [Chap.9] describes to some extent several computing platforms that can be used for privacy purposes, such as FATE, CryptDB, MesaTEE, Conclave, and PrivPy, while the final one goes through case studies from different areas, but without enough depth to be truly formative for neophyte readers and students. Overall, too light for my liking.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

data protection [not from Les Houches]

Posted in Books, Mountains, Statistics with tags , , , , , , , , , , , on March 16, 2024 by xi'an

While running a “kitchen” workshop on Bayesian privacy in Les Houches, Le Monde published on π day a recap of a recent report on AI commanded by the French Government. Among other things, it contains recommendations on alleviating the administrative blocks in accessing personal data, based on a model for data protection created decades earlier around the CNIL structure. The final paragraph wishes for the creation of a “laboratory” that would test collaborative, altruistic, efficient models towards sharing data for learning, which is one of the main goals of OCEAN. Without mentioning any technical aspect, like an adoption of some privacy measure at a national or European level.

Bayesian inference and conformal prediction

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on October 10, 2023 by xi'an

Exact MCMC with differentially private moves

Posted in Statistics with tags , , , , , , , on September 25, 2023 by xi'an

“The algorithm can be made differentially private while remaining exact in the sense that its target distribution is the true posterior distribution conditioned on the private data (…) The main contribution of this paper arises from the simple  observation that the penalty algorithm has a built-in noise in its calculations which is not desirable in any other context but can be exploited for data privacy.”

Another privacy paper by Yldirim and Ermis (in Statistics and Computing, 2019) on how MCMC can ensure privacy. For free. The original penalty algorithm of Ceperley and Dewing (1999) is a form of Metropolis-Hastings algorithm where the Metropolis-Hastings acceptance probability is replaced with an unbiased estimate (e.g., there exists an unbiased and Normal estimate of the log-acceptance ratio, λ(θ, θ’), whose exponential can be corrected to remain unbiased).  In that case, the algorithm remains exact.

“Adding noise to λ(θ, θ) may help with preserving some sort of data privacy in a Bayesian framework where [the posterior], hence λ(θ, θ), depends on the data.”

Rather than being forced into replacing the Metropolis-Hastings acceptance probability with an unbiased estimate as in pseudo-marginal MCMC, the trick here is in replacing λ(θ, θ’) with a Normal perturbation, hence preserving both the target (as shown by Ceperley and Dewing (1999)) and the data privacy, by returning a noisy likelihood ratio. Then, assuming that the difference sensitivity function for the log-likelihood [the maximum difference c(θ, θ’) over pairs of observations of the difference between log-likelihoods at two arbitrary parameter values θ and θ’] is decreasing as a power of the sample size n, the penalty algorithm is differentially private, provided the variance is large enough (in connection with c(θ, θ’)] after a certain number of MCMC iterations. Yldirim and Ermis (2019) show that the setting covers the case of distributed, private, data. even though the efficiency decreases with the number of (protected) data silos. (Another drawback is that the data owners must keep exchanging likelihood ratio estimates.

 

Bayesian differential privacy for free?

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on September 24, 2023 by xi'an

“We are interested in the question of how we can build differentially-private algorithms within the Bayesian framework. More precisely, we examine when the choice of prior is sufficient to guarantee differential privacy for decisions that are derived from the posterior distribution (…) we show that the Bayesian statistician’s choice of prior distribution ensures a base level of data privacy through the posterior distribution; the statistician can safely respond to external queries using samples from the posterior.”

Recently I came across this 2016 JMLR paper of Christos Dimitrakakis et al. on “how Bayesian inference itself can be used directly to provide private access to data, with no modification.” Which comes as a surprise since it implies that Bayesian sampling would be enough, per se, to keep both the data private and the information it conveys available. The main assumption on which this result is based is one of Lipschitz continuity of the model density, namely that, for a specific (pseudo-)distance ρ

|\log f(x|\theta)-\log f(y|\theta)|\le L\rho(x,y)

uniformly in θ over a set Θ with enough prior mass

\pi(\Theta)\ge 1-e^{-\epsilon}

for an ε>0. In this case, the Kullback-Leibler divergence between the posteriors π(θ|x) and π(θ|y) is bounded by a constant times ρ(x,y). (The constant being 2L when Θ is the entire parameter space.) This condition ensures differential privacy on the posterior distribution (and even more on the associated MCMC sample). More precisely, (2L,0)-differentially private in the case Θ is the entire parameter space. While there is an efficiency issue linked with the result since the bound L being set by the model and hence immovable, this remains a fundamental result for the field (as shown by its high number of citations).