“The algorithm can be made differentially private while remaining exact in the sense that its target distribution is the true posterior distribution conditioned on the private data (…) The main contribution of this paper arises from the simple observation that the penalty algorithm has a built-in noise in its calculations which is not desirable in any other context but can be exploited for data privacy.”
Another privacy paper by Yldirim and Ermis (in Statistics and Computing, 2019) on how MCMC can ensure privacy. For free. The original penalty algorithm of Ceperley and Dewing (1999) is a form of Metropolis-Hastings algorithm where the Metropolis-Hastings acceptance probability is replaced with an unbiased estimate (e.g., there exists an unbiased and Normal estimate of the log-acceptance ratio, λ(θ, θ’), whose exponential can be corrected to remain unbiased). In that case, the algorithm remains exact.
“Adding noise to λ(θ, θ) may help with preserving some sort of data privacy in a Bayesian framework where [the posterior], hence λ(θ, θ), depends on the data.”
Rather than being forced into replacing the Metropolis-Hastings acceptance probability with an unbiased estimate as in pseudo-marginal MCMC, the trick here is in replacing λ(θ, θ’) with a Normal perturbation, hence preserving both the target (as shown by Ceperley and Dewing (1999)) and the data privacy, by returning a noisy likelihood ratio. Then, assuming that the difference sensitivity function for the log-likelihood [the maximum difference c(θ, θ’) over pairs of observations of the difference between log-likelihoods at two arbitrary parameter values θ and θ’] is decreasing as a power of the sample size n, the penalty algorithm is differentially private, provided the variance is large enough (in connection with c(θ, θ’)] after a certain number of MCMC iterations. Yldirim and Ermis (2019) show that the setting covers the case of distributed, private, data. even though the efficiency decreases with the number of (protected) data silos. (Another drawback is that the data owners must keep exchanging likelihood ratio estimates.