Archive for Arrowleaf Cellars
Arrowleaf Cellars [pinot noir]
Posted in Statistics with tags 23w5106, Arrowleaf Cellars, British Columbia, Canada, Canadian wines, Kelowna, Okanagan Valley, Okanagan vineyards, Pacific North West, pinot noir, wildfire on October 20, 2023 by xi'anexact yet private MCMC
Posted in Statistics with tags Arrowleaf Cellars, differential privacy, ergodicity, ICML 2023, Lake Okanagan, MCMC, Metropolis-Hastings algorithm, Okanagan vineyards, Poisson subsampling, reversibility, spectral gap, stationarity on August 9, 2023 by xi'an“at each iteration, DP-fast MH first samples a minibatch size and checks if it uses a minibatch of data or full-batch data. Then it checks whether to require additional Gaussian noise. If so, it will instantiate the Gaussian mechanism which adds Gaussian noise to the energy difference function. Finally, it chooses accept or reject θ′ based on the noisy acceptance probability.”
Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference is an(other) ICML²³ paper, written by Wanrong Zhang and Ruqi Zhang. Who are running MCMC under DP constraints. For one thing, they compute the MH acceptance probability with a minibatch, which is Poisson sampled (in order to guarantee privacy). It appears as a highly calibrated algorithm (see, e.g., Algorithm 1). Under the assumption (1) that the difference between individual log densities for two values of the parameter is upper bounded (in the data), differential privacy is established as failing to detect for certain a datapoint from the MCMC output. Interestingly, the usual randomisation leading to pricacy is operated on the energy level, rather than on observations or summary statistics, although this may prove superfluous when there is enough randomness provided by the MH step itself: “inherent privacy guarantees in the MH algorithm”
“when either the privacy hyperparameter ϵ or δ becomes small, the convergence rate becomes small, characterizing how much the privacy constraint slows down the convergence speed of the Markov chain”
The major results of the paper are privacy guarantees (at each iteration) and preservation of the proper target distribution, in contrast with earlier versions. In particular, adding the Gaussian noise to the energy does not impact reversibility. (Even though I am not 100% sure I buy the entire argument about reversibility (in Appendix C) as it sounds too easy!) The authors even achieve a bound on the relative spectral gaps.
Contextual Integrity for Differential Privacy #4 [23w5106]
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags 100th birthday, Arrowleaf Cellars, Banff International Research Station for Mathematical Innovation, BIRS, British Columbia, Canada, Canadian wines, contextual integrity, covidtracker, data analysis, differential privacy, ethics, full moon, GDPR, information theory, Kelowna, Lake Okanagan, large language models, natural language processing, Okanagan Valley, Okanagan vineyards, philosophy of sciences, replicability, synthetic data, UBCO, winery, workshop on August 5, 2023 by xi'anMostly short talks. First talk by Thomas Seinke (Google) on interpreting ε, with a side wondering of mine on the relation between exp(ε) and the uncertainty that comes with Monte Carlo outcome. Which may relate to this 2022 paper by Ruobin Gong. Second talk by Gautam Kamath (U Waterloo) on large language models under privacy with “public” data. Questioning the appropriateness of ML benchmarks in terms of privacy. Third talk by Mark Bun (Boston U) on replicability, privacy and adaptive generalisation in machine learning, with a strange criticism of confidence intervals on the same parameter not intersecting for two independent studies. And proposing high probability replicable algorithms that can be put in duality with differentially private algorithms at the cost of lowering precision and effective sample size. We also had another group discussion on how to reach out about privacy guarantees, which made me realise there were GDPR compliance software available.
In the afternoon session, Shlomi Hod (Boston U) presented a practical case of designing a privacy preserving protocol for the Israeli birth record. With a strong opposition from stakeholders to use synthetic data, due to a semantic drift from synthetic to manipulated to fake, to lying. Wanrong Zhang did not talk about her stunning recent ICML paper but instead of another practical case connected with mobile based Covid case predictions, by adding minimal noise to mobility data. Nidhi Hegde (U Alberta) gave up talking on Thomson sampling with privacy protection, to focus on an ongoing health application for Alberta as more suited for the workshop. And Ria Safavi-Naini (U Calgary) drew a parallel between information theory and DP versus CI.
While the workshop was scheduled till Friday noon, in usual BIRS habits (!), the morning session was cancelled for most people leaving Kelowna in the morning.
Contextual Integrity for Differential Privacy #3 [23w5106]
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags Arrowleaf Cellars, Banff International Research Station for Mathematical Innovation, BIRS, British Columbia, Canada, Canadian wines, causal inference, contextual integrity, data analysis, differential privacy, ERC, ethics, Frind Estate winery, GDPR, George Box, John Maynard Keynes, Kelowna, Lake Okanagan, Maréchal Foch, mass spectrometer, Okanagan Valley, Okanagan vineyards, philosophy of sciences, pinot noir, record linkage, Syrah, UBCO, winery, workshop on August 4, 2023 by xi'anMorning of diverse short talks. First talk by Bei Jiang (Edmonton) on locally processed privacy for quantile estimation, which relates very much to our ongoing research with Stan, who is starting his ERC funded PhD on privacy. Randomised response, in having a positive probability of replacing indicators in the empirical cdf by a random or perturbed version whose bias can be corrected. I may have overdone the similarity though in confusing users with agents. Followed by a hacking foray by Joel Reardon (Calgary) into how much information is transmitted by apps on completely unrelated phone activity. (Moral: Never send a bug report.)
The afternoon break saw us visiting the Frind Estate winery on the other side of the lake. Meaning not only wine tasting (great Syrah!), and discovering an hybrid grape called Maréchal Foch, but also entering the lab with its mass spectrometer. (But no glimpse of the winemaking process per se…)
Contextual Integrity for Differential Privacy #2 [23w5106]
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags Arrowleaf Cellars, Banff International Research Station for Mathematical Innovation, BIRS, British Columbia, Canada, Canadian wines, causal inference, contextual integrity, data analysis, differential privacy, ethics, GDPR, George Box, John Maynard Keynes, Kelowna, Lake Okanagan, Okanagan Valley, Okanagan vineyards, philosophy of sciences, pinot noir, record linkage, UBCO, winery, workshop on August 3, 2023 by xi'anMorning of diverse short talks. First one on What are the chances? Explaining ε towards endusers by presenting odds and illustrating the impact of including one potential user’s data. Then one on re-placing DP within CI in terms of causality. And multi-agent models, illustrated by the Cambridge Analytics scandal. I am still not getting the point of the CI perspective which sounds to me like an impossibility theorem. A bit as if Statistics had stopped at “All models are wrong” (as Keynes did, in a way). And a talk on Uses & misuses of DP inference, with nice drawings explaining that publicly available information (eg, smoking causes cancer) may create breaches of privacy (Alice may have cancer). Last talk of the morning on framing effects as privileging data processors and overly technical? Fundamental law of information privacy? Got me wondering about the lack (?) of dynamic perspective so far, in the (simplistic?) sense that DP does not seem to account for potential breaches were a secondary dataset to become available with shared subjects and record linkage. (A bit of a go at GDPR, for the second time within a week.)
Before, I had a rather nice early morning in woods on top of Okanagan Lake, crossing many white tailed deer, hopefully no ticks!, as well as No trespassing signs. And a quick and c…ool swim in the Lake 20⁰ waters. No sign of the large wildfires raging south in Osoyoos or north in Kalmoops. We had a fantastic lunch break at the nearby Arrowleaf Cellars winery, with a stellar pinot noir, although this rather made the following working session harder to engage with (not mentioning the lingering jetlag)!