## Ocean’s four!

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on October 25, 2022 by xi'an

Fantastic news! The ERC-Synergy¹ proposal we submitted last year with Michael Jordan, Éric Moulines, and Gareth Roberts has been selected by the ERC (which explains for the trips to Brussels last month). Its acronym is OCEAN [hence the whale pictured by a murmuration of starlings!], which stands for On intelligenCE And Networks​: Mathematical and Algorithmic Foundations for Multi-Agent Decision-Making​. Here is the abstract, which will presumably turn public today along with the official announcement from the ERC:

Until recently, most of the major advances in machine learning and decision making have focused on a centralized paradigm in which data are aggregated at a central location to train models and/or decide on actions. This paradigm faces serious flaws in many real-world cases. In particular, centralized learning risks exposing user privacy, makes inefficient use of communication resources, creates data processing bottlenecks, and may lead to concentration of economic and political power. It thus appears most timely to develop the theory and practice of a new form of machine learning that targets heterogeneous, massively decentralized networks, involving self-interested agents who expect to receive value (or rewards, incentive) for their participation in data exchanges.

OCEAN will develop statistical and algorithmic foundations for systems involving multiple incentive-driven learning and decision-making agents, including uncertainty quantification at the agent’s level. OCEAN will study the interaction of learning with market constraints (scarcity, fairness), connecting adaptive microeconomics and market-aware machine learning.

OCEAN builds on a decade of joint advances in stochastic optimization, probabilistic machine learning, statistical inference, Bayesian assessment of uncertainty, computation, game theory, and information science, with PIs having complementary and internationally recognized skills in these domains. OCEAN will shed a new light on the value and handling data in a competitive, potentially antagonistic, multi-agent environment, and develop new theories and methods to address these pressing challenges. OCEAN requires a fundamental departure from standard approaches and leads to major scientific interdisciplinary endeavors that will transform statistical learning in the long term while opening up exciting and novel areas of research.

Since the ERC support in this grant mostly goes to PhD and postdoctoral positions, watch out for calls in the coming months or contact us at any time.

## prior elicitation

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , on January 13, 2022 by xi'an

“We believe that an elicitation method should support elicitation both in the parameter and observable space, should be model-agnostic, and should be sample-efficient since human effort is costly.”

Petrus Mikkola et al. arXived a long paper on prior elicitation addressing the (most relevant) question: Why are we not widely use prior elicitation? With a massive bibliography that could be (partly) commented (and corrected as some references are incomplete, as eg my book chapter on priors!). I think the paper would make a terrific discussion paper.

The absence of a general procedure for prior elicitation is indeed hindering the adoption of Bayesian methods outside our core community and is thus eventually detrimental to their wider development. It also carries the dangers of misled or misleading prior choices. The authors put forward the absence of “software that integrates well with the current probabilistic programming tools used for other parts of the modelling workflow.” This requires setting principles that avoid “just-press-key” solutions. (Aside: This reminds me of my very first prospective PhD student, who was then working in a startup [although the name was not yet in use in the early 1990’s!] and had build such a software in a discretised, low dimension, conjugate prior, environment by returning a form of decision-theoretic impact of the chosen hyperparameters. He alas aborted his PhD attempt due to the short-term pressing matters in the under-staffed company…)

“We inspect prior elicitation from the perspectives of (1) properties of the prior distribution itself, (2) the model family and the prior elicitation method’s dependence on it, (3) the underlying elicitation space, (4) how the method interprets the information provided by the expert, (5) computation, (6) the form and quantity of interaction with the expert(s), and (7) the assumed capability of the expert (…)”

Prior elicitation is indeed a delicate balance between incorporating expert opinion(s) and avoiding over-standardisation. In my limited experience, experts tend to be over-confident about their own opinion and unwilling to attach uncertainty to their assessments. Even when being inconsistent. When several experts are involved (as, very briefly, in Section 3.6), building a common prior quickly becomes a challenge, esp. if their interests (or utility functions) diverge. As illustrated in the case of the whaling commission analysed by Adrian Raftery in the late 1990’s. (The above quote involves a single expert.) Actually, I dislike the term expert altogether, as it comes without any grading of the reliability of the person.To hit (!) at an early statement in the paper (p.5), should the prior elicitation always depend on the (sampling) model, as experts may ignore or misapprehend the model? The posterior already accounts for the likelihood and the parameter may pre-exist wrt the model, as eg cosmological constants or vaccine efficiency… In a sense, the model should be involved as little as possible in the elicitation as the expert could confuse her beliefs about the parameter with those about the accuracy of the model. (I realise this is not necessarily a mainstream position as illustrated by this paper by Andrew and friends!)

And isn’t the first stumbling block the inability of most to represent one’s prior knowledge in probabilistic terms? Innumeracy is a shared shortcoming in the general population (and since everyone’s an expert!), as repeatedly demonstrated since the start of the Covid-19 pandemic. (See also the above point about inconsistency. Accounting for such inconsistencies in a Bayesian way is a natural answer, albeit requiring the degree of expertise and reliability to be tested.)

Is prior elicitation feasible beyond a few dimensions? Even when using the constrictive tool of copulas one hits a wall after a few dimensions, assuming the expert is willing to set a prior correlation matrix.  Most of the methods described in Section 3.1 only apply to textbook examples. In their third dimension (!), the authors mention neural network parameters but later fail to cover this type of issue. (This was the example I had in mind indeed.) And they move from parameter space to observable space. Distinguishing predictive elicitation from observational elicitation, the former being what I would have suggested from scratch. Obviously, the curse of dimensionality strikes again unless one considers summary statistics (like in ABC).

While I am glad conjugate priors do not get the lion’s share, using as in Section 3.3.. non-parametric or machine learning solutions to construct the prior sounds unrealistic. (And including maximum entropy priors into that category seems wrong since they are definitely parametric.)

The proposed Bayesian treatment of the expert’s “data” (Section 4.1) is rational but requires an additional model construct to link the expert’s data with the parameter to reach a Bayes formula like (4.1). Plus a primary prior (which could then be one of the reference priors.) Reducing the expert’s input to imaginary observations may prove too narrow, though. The notion of an iterative elicitation is most appealing and its sequential aspect may not be particularly problematic in opposition to posteriors relying on using the data twice or more. I am much less buying the hierarchical construct of Section 4.3 because they imply a return to conjugate priors and hyperpriors, are not necessarily correctly understood by experts, do not always cater to observational elicitation, and are not an answer to high-dimension challenges.

Given the state of the art, it sounds like we are still far from seeing prior elicitation as a natural part of Bayesian software and probabilistic programming. Even when using a modular, model-agnostic strategy. But this is most certainly a worthy prospect!

## what the whale?! [“whales eat carbon, not fish”]

Posted in Kids, pictures, Travel with tags , , , , , , , , , , on February 21, 2021 by xi'an

## look, look, confidence! [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , on April 23, 2018 by xi'an

As it happens, I recently bought [with Amazon Associate earnings] a (used) copy of Confidence, Likelihood, Probability (Statistical Inference with Confidence Distributions), by Tore Schweder and Nils Hjort, to try to understand this confusing notion of confidence distributions. (And hence did not get the book from CUP or anyone else towards purposely writing a review. Or a ½-review like the one below.)

“Fisher squared the circle and obtained a posterior without a prior.” (p.419)

Now that I have gone through a few chapters, I am no less confused about the point of this notion. Which seems to rely on the availability of confidence intervals. Exact or asymptotic ones. The authors plainly recognise (p.61) that a confidence distribution is neither a posterior distribution nor a fiducial distribution, hence cutting off any possible Bayesian usage of the approach. Which seems right in that there is no coherence behind the construct, meaning for instance there is no joint distribution corresponding to the resulting marginals. Or even a specific dominating measure in the parameter space. (Always go looking for the dominating measure!) As usual with frequentist procedures, there is always a feeling of arbitrariness in the resolution, as for instance in the Neyman-Scott problem (p.112) where the profile likelihood and the deviance do not work, but considering directly the distribution of the (inconsistent) MLE of the variance “saves the day”, which sounds a bit like starting from the solution. Another statistical freak, the Fieller-Creasy problem (p.116) remains a freak in this context as it does not seem to allow for a confidence distribution. I also notice an ambivalence in the discourse of the authors of this book, namely that while they claim confidence distributions are both outside a probabilisation of the parameter and inside, “producing distributions for parameters of interest given the data (…) with fewer philosophical and interpretational obstacles” (p.428).

“Bias is particularly difficult to discuss for Bayesian methods, and seems not to be a worry for most Bayesian statisticians.” (p.10)

The discussions as to whether or not confidence distributions form a synthesis of Bayesianism and frequentism always fall short from being convincing, the choice of (or the dependence on) a prior distribution appearing to the authors as a failure of the former approach. Or unnecessarily complicated when there are nuisance parameters. Apparently missing on the (high) degree of subjectivity involved in creating the confidence procedures. Chapter 1 contains a section on “Why not go Bayesian?” that starts from Chris Sims‘ Nobel Lecture on the appeal of Bayesian methods and goes [softly] rampaging through each item. One point (3) is recurrent in many criticisms of B and I always wonder whether or not it is tongue-in-cheek-y… Namely the fact that parameters of a model are rarely if ever stochastic. This is a misrepresentation of the use of prior and posterior distributions [which are in fact] as summaries of information cum uncertainty. About a true fixed parameter. Refusing as does the book to endow posteriors with an epistemic meaning (except for “Bayesian of the Lindley breed” (p.419) is thus most curious. (The debate is repeating in the final(e) chapter as “why the world need not be Bayesian after all”.)

“To obtain frequentist unbiasedness, the Bayesian will have to choose her prior with unbiasedness in mind. Is she then a Bayesian?” (p.430)

A general puzzling feature of the book is that notions are not always immediately defined, but rather discussed and illustrated first. As for instance for the central notion of fiducial probability (Section 1.7, then Chapter 6), maybe because Fisher himself did not have a general principle to advance. The construction of a confidence distribution most often keeps a measure of mystery (and arbitrariness), outside the rather stylised setting of exponential families and sufficient (conditionally so) statistics. (Incidentally, our 2012 ABC survey is [kindly] quoted in relation with approximate sufficiency (p.180), while it does not sound particularly related to this part of the book. Now, is there an ABC version of confidence distributions? Or an ABC derivation?) This is not to imply that the book is uninteresting!, as I found reading it quite entertaining, with many humorous and tongue-in-cheek remarks, like “From Fraser (1961a) and until Fraser (2011), and hopefully even further” (p.92), and great datasets. (Including one entitled Pornoscope, which is about drosophilia mating.) And also datasets with lesser greatness, like the 3000 mink whales that were killed for Example 8.5, where the authors if not the whales “are saved by a large and informative dataset”… (Whaling is a recurrent [national?] theme throughout the book, along with sport statistics usually involving Norway!)

Miscellanea: The interest of the authors in the topic is credited to bowhead whales, more precisely to Adrian Raftery’s geometric merging (or melding) of two priors and to the resulting Borel paradox (xiii). Proposal that I remember Adrian presenting in Luminy, presumably in 1994. Or maybe in Aussois the year after. The book also repeats Don Fraser’s notion that the likelihood is a sufficient statistic, a point that still bothers me. (On the side, I realised while reading Confidence, &tc., that ABC cannot comply with the likelihood principle.) To end up on a French nitpicking note (!), Quenouille is typ(o)ed Quenoille in the main text, the references and the index. (Blame the .bib file!)

Posted in Books, Statistics with tags , , , , , , , on September 10, 2009 by xi'an

I really like the models derived from capture-recapture experiments, because they encompass latent variables, hidden Markov process, Gibbs simulation, EM estimation, and hierarchical models in a simple setup with a nice side story to motivate it (at least in Ecology, in Social Sciences, those models are rather associated with sad stories like homeless, heroin addicts or prostitutes…) I was thus quite surprised to hear from many that the capture-recapture chapter in Bayesian Core was hard to understand. In a sense, I find it easier than the mixture chapter because the data is discrete and everything can [almost!] be done by hand…

Today I received an email from Cristiano about a typo in The Bayesian Choice concerning capture-recapture models:

“I’ve read the paragraph (4.3.3) in your book and I have some doubts about the proposed formula in example 4.3.3. My guess is that a typo is here, where (n-n_1) instead of n_2 should appear in the hypergeometric distribution.”

It is indeed the case! This mistake has been surviving the many revisions and reprints of the book and is also found in the French translation Le Choix Bayésien, in Example 4.19… In both cases, ${n_2 \choose n_2-n_{11}}$ should be ${n-n_1 \choose n_2-n_{11}}$, shame on me! (The mistake does not appear in Bayesian Core.)

to which I can only suggest to incorporate the error-in-variable structure, ie the possible confusion  in identifying individuals, within the model and to run a Gibbs sampler that simulates iteratively the latent variable” true numbers of individuals in captures 1 and 2″ and the parameters given those latent variables. This problem of counting the same individual twice or more has obvious applications in Ecology, when animals are only identified by watchers, as in whale sightings, and in Social Sciences, when individuals are lacking identification. [To answer specifically the overestimation question, this is clearly the case since $n_1$ and $n_2$ are larger than in truth, while $n_{11}$ presumably remains the same….]