Archive for George Box

Contextual Integrity for Differential Privacy #3 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 4, 2023 by xi'an

Morning of diverse short talks. First talk by Bei Jiang (Edmonton) on locally processed privacy for quantile estimation, which relates very much to our ongoing research with Stan, who is starting his ERC funded PhD on privacy. Randomised response, in having a positive probability of replacing indicators in the empirical cdf by a random or perturbed version whose bias can be corrected. I may have overdone the similarity though in confusing users with agents. Followed by a hacking foray by Joel Reardon (Calgary) into how much information is transmitted by apps on completely unrelated phone activity. (Moral: Never send a bug report.)

The afternoon break saw us visiting the Frind Estate winery on the other side of the lake. Meaning not only wine tasting (great Syrah!), and discovering an hybrid grape called Maréchal Foch, but also entering the lab with its mass spectrometer. (But no glimpse of the winemaking process per se…)

Contextual Integrity for Differential Privacy #2 [23w5106]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , on August 3, 2023 by xi'an

Morning of diverse short talks. First one on What are the chances? Explaining ε towards endusers by presenting odds and illustrating the impact of including one potential user’s data. Then one on re-placing DP within CI in terms of causality. And multi-agent models, illustrated by the Cambridge Analytics scandal. I am still not getting the point of the CI perspective which sounds to me like an impossibility theorem. A bit as if Statistics had stopped at “All models are wrong” (as Keynes did, in a way). And a talk on Uses & misuses of DP inference, with nice drawings explaining that publicly available information (eg, smoking causes cancer) may create breaches of privacy (Alice may have cancer). Last talk of the morning on framing effects as privileging data processors and overly technical? Fundamental law of information privacy? Got me wondering about the lack (?) of dynamic perspective so far, in the (simplistic?) sense that DP does not seem to account for potential breaches were a secondary dataset to become available with shared subjects and record linkage. (A bit of a go at GDPR, for the second time within a week.)

Before, I had a rather nice early morning in woods on top of Okanagan Lake, crossing many white tailed deer, hopefully no ticks!, as well as No trespassing signs. And a quick and c…ool swim in the Lake 20⁰ waters. No sign of the large wildfires raging south in Osoyoos or north in Kalmoops. We had a fantastic lunch break at the nearby Arrowleaf Cellars winery, with a stellar pinot noir, although this rather made the following working session harder to engage with (not mentioning the lingering jetlag)!

RSS Read Paper

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on April 17, 2017 by xi'an

I had not attended a Read Paper session at the Royal Statistical Society in Errol Street for quite a while and hence it was quite a treat to be back there, especially as a seconder of the vote of thanks for the paper of Andrew Gelman and Christian Hennig. (I realised at this occasion that I had always been invited as a seconder, who in the tradition of the Read Papers is expected to be more critical of the paper. When I mentioned that to a friend, he replied they knew me well!) Listening to Andrew (with no slide) and Christian made me think further about the foundations of statistics and the reasons why we proceed as we do. In particular about the meaning and usages of a statistical model. Which is only useful (in the all models are wrong meme) if the purpose of the statistical analysis is completely defined. Searching for the truth does not sound good enough. And this brings us back full circle to decision theory in my opinion, which should be part of the whole picture and the virtues of openness, transparency and communication.

During his talk, Christian mentioned outliers as a delicate issue in modelling and I found this was a great example of a notion with no objective meaning, in that it is only defined in terms of or against a model, in that it addresses the case of observations not fitting a model instead of a model not fitting some observations, hence as much a case of incomplete (lazy?) modelling as an issue of difficult inference. And a discussant (whose Flemish name I alas do not remember) came with the slide below of an etymological reminder that originally (as in Aristotle) the meaning of objectivity and subjectivity were inverted, in that the later meant about the intrinsic nature of the object, while the former was about the perception of this object. It is only in the modern (?) era that Immanuel Kant reverted the meanings…Last thing, I plan to arXiv my discussions, so feel free to send me yours to add to the arXiv document. And make sure to spread the word about this discussion paper to all O-Bayesians as they should feel concerned about this debate!

all models are wrong

Posted in Statistics, University life with tags , , , , , , , on September 27, 2014 by xi'an

“Using ABC to evaluate competing models has various hazards and comes with recommended precautions (Robert et al. 2011), and unsurprisingly, many if not most researchers have a healthy scepticism as these tools continue to mature.”

Michael Hickerson just published an open-access letter with the above title in Molecular Ecology. (As in several earlier papers, incl. the (in)famous ones by Templeton, Hickerson confuses running an ABC algorithm with conducting Bayesian model comparison, but this is not the main point of this post.)

“Rather than using ABC with weighted model averaging to obtain the three corresponding posterior model probabilities while allowing for the handful of model parameters (θ, τ, γ, Μ) to be estimated under each model conditioned on each model’s posterior probability, these three models are sliced up into 143 ‘submodels’ according to various parameter ranges.”

The letter is in fact a supporting argument for the earlier paper of Pelletier and Carstens (2014, Molecular Ecology) which conducted the above splitting experiment. I could not read this paper so cannot judge of the relevance of splitting this way the parameter range. From what I understand it amounts to using mutually exclusive priors by using different supports.

“Specifically, they demonstrate that as greater numbers of the 143 sub-models are evaluated, the inference from their ABC model choice procedure becomes increasingly.”

An interestingly cut sentence. Increasingly unreliable? mediocre? weak?

“…with greater numbers of models being compared, the most probable models are assigned diminishing levels of posterior probability. This is an expected result…”

True, if the number of models under consideration increases, under a uniform prior over model indices, the posterior probability of a given model mechanically decreases. But the pairwise Bayes factors should not be impacted by the number of models under comparison and the letter by Hickerson states that Pelletier and Carstens found the opposite:

“…pairwise Bayes factor[s] will always be more conservative except in cases when the posterior probabilities are equal for all models that are less probable than the most probable model.”

Which means that the “Bayes factor” in this study is computed as the ratio of a marginal likelihood and of a compound (or super-marginal) likelihood, averaged over all models and hence incorporating the prior probabilities of the model indices as well. I had never encountered such a proposal before. Contrary to the letter’s claim:

“…using the Bayes factor, incorporating all models is perhaps more consistent with the Bayesian approach of incorporating all uncertainty associated with the ABC model choice procedure.”

Besides the needless inclusion of ABC in this sentence, a somewhat confusing sentence, as Bayes factors are not, stricto sensu, Bayesian procedures since they remove the prior probabilities from the picture.

“Although the outcome of model comparison with ABC or other similar likelihood-based methods will always be dependent on the composition of the model set, and parameter estimates will only be as good as the models that are used, model-based inference provides a number of benefits.”

All models are wrong but the very fact that they are models allows for producing pseudo-data from those models and for checking if the pseudo-data is similar enough to the observed data. In components that matters the most for the experimenter. Hence a loss function of sorts…

interesting mis-quote

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on September 25, 2014 by xi'an

At a recent conference on Big Data, one speaker mentioned this quote from Peter Norvig, the director of research at Google:

“All models are wrong, and increasingly you can succeed without them.”

quote that I found rather shocking, esp. when considering the amount of modelling behind Google tools. And coming from someone citing Kernel Methods for Pattern Analysis by Shawe-Taylor and Christianini as one of his favourite books and Bayesian Data Analysis as another one… Or displaying Bayes [or his alleged portrait] and Turing in his book cover. So I went searching on the Web for more information about this surprising quote. And found the explanation, as given by Peter Norvig himself:

“To set the record straight: That’s a silly statement, I didn’t say it, and I disagree with it.”

Which means that weird quotes have a high probability of being misquotes. And used by others to (obviously) support their own agenda. In the current case, Chris Anderson and his End of Theory paradigm. Briefly and mildly discussed by Andrew a few years ago.