Archive for Read paper

safe Bayes & e-values & least favourable priors

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , on March 2, 2024 by xi'an

The paper by Peter Grünwald, Rianne de Heide and Wouter Koolen on safe testing was read before The Royal Statistical Society at a meeting organized by the Research Section on Wednesday, 24th January, 2024, after many years in the making, to the point that several papers based on this initial one have appeared in the meanwhile, incl. some submissions to Biometrika. Like this one in the current issue of Statistical Science dedicated to reproducibility and replicability. Joshua Bon and I wrote a discussion that synthesised the following and sometimes rambling remarks.

Overall, this is a mind-challenging paper with definitely original style and contents for which the authors are to be congratulated!

“…p-values are interpreted as indicating amounts of evidence against the null, and their definition does not need to refer to any specific alternative H¹. Exactly the same holds for e-values: the basic interpretation ‘a large e-value provides evidence against H’ holds no matter how the e-variable is defined, as long as it satisfies (1). If they are defined relative to H¹ that is close to the actual process generating the data they will grow fast and provide a lot of evidence, but the basic interpretation holds regardless.”

About the entry section, one may ask why would a Bayesian want to test the veracity of a null hypothesis. The debate has been raging since the early days, although Jeffreys spent two chapters of his book on the topic of testing. (While appearing in Example 5 p.14 for his point estimation prior.) From an opposite viewpoint, the construction of e-values and such in the paper is highly model dependent, but all models are wrong! and more to the point both hypotheses may turn out to be wrong for misspecified cases. The notion thus seems on the opposite to be very M-close, with no idea of what is happening under misspecified models or why is rejecting H⁰ the ultimate argument.

When introducing e-values, (1) is not a definition per se, since otherwise E≡1 would be an e-value. This is unfortunate as the topic is already confusing enough. E[E] must be larger than 1 under H¹, otherwise product of e-values would always degenerate to zero (?)

The points

  1. behaviour under optional continuation [by a martingale reasoning]
  2. interpretation as ‘evidence against the null’ as gambling [unethical!]
  3. in all cases preserving frequentist Type I error guarantees
  4. e-variables turn out to be Bayes factors based on the right Haar prior [rather than sometimes with highly unusual (e.g. degenerate) priors? p.4]
  5. e-variables need more extreme data than p-values in order to reject the null

are rather worthwhile, even though 2. is vague and 3. is firmly frequentist. Any theory involving Haar priors (and even better amenability) cannot be all wrong, though, even considering that Haar priors are improper. The optional continuation in 1. is a nice argument from a Bayesian viewpoint since it has also been used to defend the Bayesian approach. Point 4. brings a formal way to define least favourable priors in the testing sense. One may then wonder at the connection with the solution of Bayarri and Garcia-Donato (Biometrika, 2007). The perspective adopted therein is somehow an inverse of the more common stance when the prior on H⁰ is the starting point [and obviously known]. So, is there any dual version of e-values where this would happen, i.e. leading to deriving the optimal prior on H¹ for a given prior on H⁰? (Which would further offer a maximin interpretation.) Theorem 1 indeed sounds like the minimax=maximin result for test settings. (In Corollary 2, why is (10) necessarily a Bayes factor, given the two models?)

While I first thought that the approach leads to finding a proper prior, the “Almost Bayesian Case” [p.17] (ABC!!) comes to justify the use of a “common” improper prior over nuisance parameters under both hypotheses, which while more justifiable than in the original objective Bayes literature, remains unsatisfactory to me. But I like the notion in 2.2 [p.10] that a prior chosen on H¹ forces one to adopt a particular corresponding prior on H⁰, as it defines a form of automated projection that we also considered in Goutis [RIP] and Robert (Biometrika, 1998). Corollary 2 is most interesting as well. However, taking the toy example of H⁰ being a normal mean standing in (-a,a) seems to lead to the optimal prior on H⁰ being a point mass at +/- a for any marginal m(y) centred at zero. Which is a disappointing outcome when compared with the point mass situation. It is another disappointment that the Bayes Factor cannot be an e-value since (6) fails to hold, but (1) is not (6) and one could argue that the BF is an e-value when integrating under the marginals!

As a marginalia, the paper made me learn about the term (and theme) tragedy of the commons, a concept developed by [the neomalthusian and eugenist] Garett Hardin.

In conclusion, we congratulate the authors on this endeavour but it remains unclear to us (as Bayesians) (i) how to construct the least favourable prior on H0 on a general basis, especially from a computational viewpoint, and, more importantly, (ii) whether it is at all of inferential interest [i.e., whether it degenerates into a point mass]. With respect to the sequential directions of the paper, we also wonder at the potential connections with sequential Monte Carlo, for instance, towards conducting sequential model choice by constructing efficiently an amalgamated evidence value when the product of Bayes factors is not a Bayes factor (see Buchholz et al., 2023).

Arnak Dalalyan at the RSS Journal Webinar

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on October 15, 2023 by xi'an

My friend and CREST colleague Arnak Dalalyan will (re)present [online] a Read Paper at the RSS on 31 October with my friends Hani Doss and Alain Durmus as discussants:

‘Theoretical Guarantees for Approximate Sampling and Log-Concave Densities’

Arnak Dalalyan ENSAE Paris, France

Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no well-developed theory providing meaningful non-asymptotic guarantees for the approximate sampling procedures, especially in high dimensional problems. The paper makes some progress in this direction by considering the problem of sampling from a distribution having a smooth and log-concave density defined on ℝᵖ⁠, for some integer p > 0. We establish non-asymptotic bounds for the error of approximating the target distribution by the distribution obtained by the Langevin Monte Carlo method and its variants. We illustrate the effectiveness of the established guarantees with various experiments. Underlying our analysis are insights from the theory of continuous time diffusion processes, which may be of interest beyond the framework of log-concave densities that are considered in the present work.

optimal importance sampling

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on May 31, 2023 by xi'an

In Stein Π-Importance Sampling, Congye Wang et al. (mostly from Newcastle, UK) build an MCMC scheme with invariant distribution Π targeting a distribution P, showing that the optimal solution (in terms of a discrepancy) differs from P when the chain is Stein-sampled, e..g. via kernel discrepancies. In terms of densities, the solution is

\pi^\star(x)\propto p(x)k_P(x)^{1/2}

the correction involving the root of a Stein kernel, introduced by Oates, Girolami, and Chopin in their 2017 Series B Read Paper. This is rather paradoxical, even though the outcome does depend on the divergence criterion. Most intriguing!!!

Estimating means of bounded random variables by betting

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on April 9, 2023 by xi'an

Ian Waudby-Smith and Aaditya Ramdas are presenting next month a Read Paper to the Royal Statistical Society in London on constructing a conservative confidence interval on the mean of a bounded random variable. Here is an extended abstract from within the paper:

For each m ∈ [0, 1], we set up a “fair” multi-round game of statistician
against nature whose payoff rules are such that if the true mean happened
to equal m, then the statistician can neither gain nor lose wealth in
expectation (their wealth in the m-th game is a nonnegative martingale),
but if the mean is not m, then it is possible to bet smartly and make
money. Each round involves the statistician making a bet on the next
observation, nature revealing the observation and giving the appropriate
(positive or negative) payoff to the statistician. The statistician then plays
all these games (one for each m) in parallel, starting each with one unit of
wealth, and possibly using a different, adaptive, betting strategy in each.
The 1 − α confidence set at time t consists of all m 2 [0, 1] such that the
statistician’s money in the corresponding game has not crossed 1/α. The
true mean μ will be in this set with high probability.

I read the paper on the flight back from Venice and was impressed by its universality, especially for a non-asymptotic method, while finding the expository style somewhat unusual for Series B, with notions late into being defined if at all defined. As an aside, I also enjoyed the historical connection to Jean Ville‘s 1939 PhD thesis (examined by Borel, Fréchet—his advisor—and Garnier) on a critical examination of [von Mises’] Kollektive. (The story by Glenn Shafer of Ville’s life till the war is remarkable, with the de Beauvoir-Sartre couple making a surprising and rather unglorious appearance!). Himself inspired by a meeting with Wald while in Berlin. The paper remains quite allusive about Ville‘s contribution, though, while arguing about its advance respective to Ville’s work… The confidence intervals (and sequences) depend on a supermartingale construction of the form

M_t(m):=\prod_{i=1}^t \exp\left\{ \lambda_i(X_i-m)-v_i\psi(\lambda_i)\right\}

which allows for a universal coverage guarantee of the derived intervals (and can optimised in λ). As I am getting confused by that point about the overall purpose of the analysis, besides providing an efficient confidence construction, and am lacking in background about martingales, betting, and sequential testing, I will not contribute to the discussion. Especially since ChatGPT cannot help me much, with its main “criticisms” (which I managed to receive while in Italy, despite the Italian Government banning the chabot!)

However, there are also some potential limitations and challenges to this approach. One limitation is that the accuracy of the method is dependent on the quality of the prior distribution used to set the odds. If the prior distribution is poorly chosen, the resulting estimates may be inaccurate. Additionally, the method may not work well for more complex or high-dimensional problems, where there may not be a clear and intuitive way to set up the betting framework.

and

Another potential consequence is that the use of a betting framework could raise ethical concerns. For example, if the bets are placed on sensitive or controversial topics, such as medical research or political outcomes, there may be concerns about the potential for manipulation or bias in the betting markets. Additionally, the use of betting as a method for scientific or policy decision-making may raise questions about the appropriate role of gambling in these contexts.

being totally off the radar… (No prior involved, no real-life consequence for betting, no gambling.)

Bayesian inference: challenges, perspectives, and prospects

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 29, 2023 by xi'an

Over the past year, Judith, Michael and I edited a special issue of Philosophical Transactions of the Royal Society on Bayesian inference: challenges, perspectives, and prospects, in celebration of the current President of the Royal Society, Adrian Smith, and his contributions to Bayesian analysis that have impacted the field up to this day. The issue is now out! The following is the beginning of our introduction of the series.

When contemplating his past achievements, it is striking to align the emergence of massive advances in these fields with some papers or books of his. For instance, Lindley’s & Smith’s ‘Bayes Estimates for the Linear Model’ (1971), a Read Paper at the Royal Statistical Society, is making the case for the Bayesian analysis of this most standard statistical model, as well as emphasizing the notion of exchangeability that is foundational in Bayesian statistics, and paving the way to the emergence of hierarchical Bayesian modelling. It thus makes a link between the early days of Bruno de Finetti, whose work Adrian Smith translated into English, and the current research in non-parametric and robust statistics. Bernardo’s & Smith’s masterpiece, Bayesian Theory (1994), sets statistical inference within decision- and information-theoretic frameworks in a most elegant and universal manner that could be deemed a Bourbaki volume for Bayesian statistics if this classification endeavour had reached further than pure mathematics. It also emphasizes the central role of hierarchical modelling in the construction of priors, as exemplified in Carlin’s et al.‘Hierarchical Bayesian analysis of change point problems’ (1992).

The series of papers published in 1990 by Alan Gelfand & Adrian Smith, esp. ‘Sampling-Based Approaches to Calculating Marginal Densities’ (1990), is overwhelmingly perceived as the birth date of modern Markov chain Monte Carlo (MCMC) methods, as itbrought to the whole statistics community (and the quickly wider communities) the realization that MCMC simulation was the sesame to unlock complex modelling issues. The consequences on the adoption of Bayesian modelling by non-specialists are enormous and long-lasting.Similarly, Gordon’set al.‘Novel approach to nonlinear/non-Gaussian Bayesian state estimation’ (1992) is considered as the birthplace of sequential Monte Carlo, aka particle filtering, with considerable consequences in tracking, robotics, econometrics and many other fields. Titterington’s, Smith’s & Makov’s reference book, ‘Statistical Analysis of Finite Mixtures(1984)  is a precursor in the formalization of heterogeneous data structures, paving the way for the incoming MCMC resolutions like Tanner & Wong (1987), Gelman & King (1990) and Diebolt & Robert (1990). Denison et al.’s book, ‘Bayesian methods for nonlinear classification and regression’ (2002) is another testimony to the influence of Adrian Smith on the field,stressing the emergence of robust and general classification and nonlinear regression methods to analyse complex data, prefiguring in a way the later emergence of machine-learning methods,with the additional Bayesian assessment of uncertainty. It is also bringing forward the capacity of operating Bayesian non-parametric modelling that is now broadly accepted, following a series of papers by Denison et al. in the late 1990s like CART and MARS.

We are quite grateful to the authors contributing to this volume, namely Joshua J. Bon, Adam Bretherton, Katie Buchhorn, Susanna Cramb, Christopher Drovandi, Conor Hassan, Adrianne L. Jenner, Helen J. Mayfield, James M. McGree, Kerrie Mengersen, Aiden Price, Robert Salomone, Edgar Santos-Fernandez, Julie Vercelloni and Xiaoyu Wang, Afonso S. Bandeira, Antoine Maillard, Richard Nickl and Sven Wang , Fan Li, Peng Ding and Fabrizia Mealli, Matthew Stephens, Peter D. Grünwald, Sumio Watanabe, Peter Müller, Noirrit K. Chandra and Abhra Sarkar, Kori Khan and Alicia Carriquiry, Arnaud Doucet, Eric Moulines and Achille Thin, Beatrice Franzolini, Andrea Cremaschi, Willem van den Boom and Maria De Iorio, Sandra Fortini and Sonia Petrone, Sylvia Frühwirth-Schnatter, Sara Wade, Chris C. Holmes and Stephen G. Walker, Lizhen Nie and Veronika Ročková. Some of the papers are open-access, if not all, hence enjoy them!