Archive for data privacy

data protection [not from Les Houches]

Posted in Books, Mountains, Statistics with tags , , , , , , , , , , , on March 16, 2024 by xi'an

While running a “kitchen” workshop on Bayesian privacy in Les Houches, Le Monde published on π day a recap of a recent report on AI commanded by the French Government. Among other things, it contains recommendations on alleviating the administrative blocks in accessing personal data, based on a model for data protection created decades earlier around the CNIL structure. The final paragraph wishes for the creation of a “laboratory” that would test collaborative, altruistic, efficient models towards sharing data for learning, which is one of the main goals of OCEAN. Without mentioning any technical aspect, like an adoption of some privacy measure at a national or European level.

postdoctoral research positions at PariSanté

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on March 7, 2024 by xi'an

Thanks to the 2023-2029 ERC Synergy grant OCEAN (On intelligenCE And Networks: Synergistic research in Bayesian Statistics, Microeconomics and Computer Sciences), I am seeking one or two postdoctoral researchers with an interest in Bayesian federated learning, distributed MCMC, approximate Bayesian inference and computing, and data privacy.

The project is based at Université Paris Dauphine, on the new PariSanté Campus.  The postdocs will join the OCEAN teams of researchers directed by Éric Moulines and myself (Christian P Robert) to work on the above themes with multiple possibilities of focus from statistical theory, to Bayesian methodology, to decision theory, to algorithms, to medical applications. Collaborations with the OCEAN teams of researchers directed by Michael Jordan (Berkeley) and Gareth Roberts (Warwick) will further be encouraged and related travel will be supported.

Qualifications

The candidates should hold a doctorate in applied maths, statistics or machine learning, with demonstrated skills in Bayesian analysis, game theory, Monte Carlo methodology or numerical probability, an excellent record of publications in these domains, and an interest in working as part of an interdisciplinary international team. Scientific maturity and research autonomy are a must for applying. There is no deadline for the positions, which will be filled when a suitable candidate is selected.

Funding

Besides a 2 year postdoctoral contract at Université Paris Dauphine (with possible extension for another year), at a salary of 31K€ per year, the project will fund travel to OCEAN partners’ institutions (University of Warwick or University of Berkeley) and participation to yearly summer schools and conferences. Standard French university benefits are attached to the position and no teaching duty is involved, as per ERC rules.

The starting date of the postdoctoral positions is negotiable depending on the applicants’ availability.

Application Procedure

  • To apply, please send the following entries in one pdf file to Christian Robert (bayesianstatistics@gmail.com).
  • a letter of application,
  • a CV,

Letters of recommendation are to be sent directly by their author.

consolidator grants 2023

Posted in Statistics with tags , , , , , , , , , , on November 28, 2023 by xi'an

Exact MCMC with differentially private moves

Posted in Statistics with tags , , , , , , , on September 25, 2023 by xi'an

“The algorithm can be made differentially private while remaining exact in the sense that its target distribution is the true posterior distribution conditioned on the private data (…) The main contribution of this paper arises from the simple  observation that the penalty algorithm has a built-in noise in its calculations which is not desirable in any other context but can be exploited for data privacy.”

Another privacy paper by Yldirim and Ermis (in Statistics and Computing, 2019) on how MCMC can ensure privacy. For free. The original penalty algorithm of Ceperley and Dewing (1999) is a form of Metropolis-Hastings algorithm where the Metropolis-Hastings acceptance probability is replaced with an unbiased estimate (e.g., there exists an unbiased and Normal estimate of the log-acceptance ratio, λ(θ, θ’), whose exponential can be corrected to remain unbiased).  In that case, the algorithm remains exact.

“Adding noise to λ(θ, θ) may help with preserving some sort of data privacy in a Bayesian framework where [the posterior], hence λ(θ, θ), depends on the data.”

Rather than being forced into replacing the Metropolis-Hastings acceptance probability with an unbiased estimate as in pseudo-marginal MCMC, the trick here is in replacing λ(θ, θ’) with a Normal perturbation, hence preserving both the target (as shown by Ceperley and Dewing (1999)) and the data privacy, by returning a noisy likelihood ratio. Then, assuming that the difference sensitivity function for the log-likelihood [the maximum difference c(θ, θ’) over pairs of observations of the difference between log-likelihoods at two arbitrary parameter values θ and θ’] is decreasing as a power of the sample size n, the penalty algorithm is differentially private, provided the variance is large enough (in connection with c(θ, θ’)] after a certain number of MCMC iterations. Yldirim and Ermis (2019) show that the setting covers the case of distributed, private, data. even though the efficiency decreases with the number of (protected) data silos. (Another drawback is that the data owners must keep exchanging likelihood ratio estimates.

 

Bayesian differential privacy for free?

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on September 24, 2023 by xi'an

“We are interested in the question of how we can build differentially-private algorithms within the Bayesian framework. More precisely, we examine when the choice of prior is sufficient to guarantee differential privacy for decisions that are derived from the posterior distribution (…) we show that the Bayesian statistician’s choice of prior distribution ensures a base level of data privacy through the posterior distribution; the statistician can safely respond to external queries using samples from the posterior.”

Recently I came across this 2016 JMLR paper of Christos Dimitrakakis et al. on “how Bayesian inference itself can be used directly to provide private access to data, with no modification.” Which comes as a surprise since it implies that Bayesian sampling would be enough, per se, to keep both the data private and the information it conveys available. The main assumption on which this result is based is one of Lipschitz continuity of the model density, namely that, for a specific (pseudo-)distance ρ

|\log f(x|\theta)-\log f(y|\theta)|\le L\rho(x,y)

uniformly in θ over a set Θ with enough prior mass

\pi(\Theta)\ge 1-e^{-\epsilon}

for an ε>0. In this case, the Kullback-Leibler divergence between the posteriors π(θ|x) and π(θ|y) is bounded by a constant times ρ(x,y). (The constant being 2L when Θ is the entire parameter space.) This condition ensures differential privacy on the posterior distribution (and even more on the associated MCMC sample). More precisely, (2L,0)-differentially private in the case Θ is the entire parameter space. While there is an efficiency issue linked with the result since the bound L being set by the model and hence immovable, this remains a fundamental result for the field (as shown by its high number of citations).