Archive for Arrowleaf Cellars

Arrowleaf Cellars [pinot noir]

Posted in Statistics with tags , , , , , , , , , , on October 20, 2023 by xi'an

exact yet private MCMC

Posted in Statistics with tags , , , , , , , , , , , on August 9, 2023 by xi'an

“at each iteration, DP-fast MH first samples a minibatch size and checks if it uses a minibatch of data or full-batch data. Then it checks whether to require additional Gaussian noise. If so, it will instantiate the Gaussian mechanism which adds Gaussian noise to the energy difference function. Finally, it chooses accept or reject θ′ based on the noisy acceptance probability.”

Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference is an(other) ICML²³ paper, written by Wanrong Zhang and Ruqi Zhang.  Who are running MCMC under DP constraints. For one thing, they compute the MH acceptance probability with a minibatch, which is Poisson sampled (in order to guarantee privacy). It appears as a highly calibrated algorithm (see, e.g., Algorithm 1). Under the assumption (1) that the difference between individual log densities for two values of the parameter is upper bounded (in the data), differential privacy is established as failing to detect for certain a datapoint from the MCMC output. Interestingly, the usual randomisation leading to pricacy is operated on the energy level, rather than on observations or summary statistics, although this may prove superfluous when there is enough randomness provided by the MH step itself: “inherent privacy guarantees in the MH algorithm”

“when either the privacy hyperparameter ϵ or δ becomes small, the convergence rate becomes small, characterizing how much the privacy constraint slows down the convergence speed of the Markov chain”

The major results of the paper are privacy guarantees (at each iteration) and preservation of the proper target distribution, in contrast with earlier versions. In particular, adding the Gaussian noise to the energy does not impact reversibility. (Even though I am not 100% sure I buy the entire argument about reversibility (in Appendix C) as it sounds too easy!) The authors even achieve a bound on the relative spectral gaps.

Contextual Integrity for Differential Privacy #4 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , on August 5, 2023 by xi'an

Mostly short talks. First talk by Thomas Seinke (Google) on interpreting ε, with a side wondering of mine on the relation between exp(ε) and the uncertainty that comes with Monte Carlo outcome. Which may relate to this 2022 paper by Ruobin Gong. Second talk by Gautam Kamath (U Waterloo) on large language models under privacy with “public” data. Questioning the appropriateness of ML benchmarks in terms of privacy. Third talk by Mark Bun (Boston U) on replicability, privacy and adaptive generalisation in machine learning, with a strange criticism of confidence intervals on the same parameter not intersecting for two independent studies. And proposing high probability replicable algorithms that can be put in duality with differentially private algorithms at the cost of lowering precision and effective sample size. We also had another group discussion on how to reach out about privacy guarantees, which made me realise there were GDPR compliance software available.

In the afternoon session, Shlomi Hod (Boston U) presented a practical case of designing a privacy preserving protocol for the Israeli birth record. With a strong opposition from stakeholders to use synthetic data, due to a semantic drift from synthetic to manipulated to fake, to lying. Wanrong Zhang did not talk about her stunning recent ICML paper but instead of another practical case connected with mobile based Covid case predictions, by adding minimal noise to mobility data. Nidhi Hegde (U Alberta) gave up talking on Thomson sampling with privacy protection, to focus on an ongoing health application for Alberta as more suited for the workshop. And Ria Safavi-Naini (U Calgary) drew a parallel between information theory and DP versus CI.

While the workshop was scheduled till Friday noon, in usual BIRS habits (!), the morning session was cancelled for most people leaving Kelowna in the morning.

Contextual Integrity for Differential Privacy #3 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 4, 2023 by xi'an

Morning of diverse short talks. First talk by Bei Jiang (Edmonton) on locally processed privacy for quantile estimation, which relates very much to our ongoing research with Stan, who is starting his ERC funded PhD on privacy. Randomised response, in having a positive probability of replacing indicators in the empirical cdf by a random or perturbed version whose bias can be corrected. I may have overdone the similarity though in confusing users with agents. Followed by a hacking foray by Joel Reardon (Calgary) into how much information is transmitted by apps on completely unrelated phone activity. (Moral: Never send a bug report.)

The afternoon break saw us visiting the Frind Estate winery on the other side of the lake. Meaning not only wine tasting (great Syrah!), and discovering an hybrid grape called Maréchal Foch, but also entering the lab with its mass spectrometer. (But no glimpse of the winemaking process per se…)

Contextual Integrity for Differential Privacy #2 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , on August 3, 2023 by xi'an

Morning of diverse short talks. First one on What are the chances? Explaining ε towards endusers by presenting odds and illustrating the impact of including one potential user’s data. Then one on re-placing DP within CI in terms of causality. And multi-agent models, illustrated by the Cambridge Analytics scandal. I am still not getting the point of the CI perspective which sounds to me like an impossibility theorem. A bit as if Statistics had stopped at “All models are wrong” (as Keynes did, in a way). And a talk on Uses & misuses of DP inference, with nice drawings explaining that publicly available information (eg, smoking causes cancer) may create breaches of privacy (Alice may have cancer). Last talk of the morning on framing effects as privileging data processors and overly technical? Fundamental law of information privacy? Got me wondering about the lack (?) of dynamic perspective so far, in the (simplistic?) sense that DP does not seem to account for potential breaches were a secondary dataset to become available with shared subjects and record linkage. (A bit of a go at GDPR, for the second time within a week.)

Before, I had a rather nice early morning in woods on top of Okanagan Lake, crossing many white tailed deer, hopefully no ticks!, as well as No trespassing signs. And a quick and c…ool swim in the Lake 20⁰ waters. No sign of the large wildfires raging south in Osoyoos or north in Kalmoops. We had a fantastic lunch break at the nearby Arrowleaf Cellars winery, with a stellar pinot noir, although this rather made the following working session harder to engage with (not mentioning the lingering jetlag)!