## martingale posteriors

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on November 7, 2022 by xi'an

A new Royal Statistical Society Read Paper featuring Edwin Fong, Chris Holmes, and Steve Walker. Starting from the predictive

$p(y_{n+1:+\infty}|y_{1:n})\ \ \ (1)$

rather than from the posterior distribution on the parameter is a fairly novel idea, also pursued by Sonia Petrone and some of her coauthors. It thus adopts a de Finetti’s perspective while adding some substance to the rather metaphysical nature of the original. It however relies on the “existence” of an infinite sample in (1) that assumes a form of underlying model à la von Mises or at least an infinite population. The representation of a parameter θ as a function of an infinite sequence comes as a shock first but starts making sense when considering it as a functional of the underlying distribution. Of course, trading (modelling) a random “opaque” parameter θ for (envisioning) an infinite sequence of random (un)observations may sound like a sure loss rather than as a great deal, but it gives substance to the epistemic uncertainty about a distributional parameter, even when a model is assumed, as in Example 1, which defines θ in the usual parametric way (i.e., the mean of the iid variables). Furthermore, the link with bootstrap and even more Bayesian bootstrap becomes clear when θ is seen this way.

Always a fan of minimal loss approaches, but (2.4) defines either a moment or a true parameter value that depends on the parametric family indexed by θ. Hence does not exist outside the primary definition of said parametric family. The following construct of the empirical cdf based on the infinite sequence as providing the θ function is elegant but what is its Bayesian justification? (I did not read Appendix C.2. in full detail but could not spot the prior on F.)

“The resemblance of the martingale posterior to a bootstrap estimator should not have gone unnoticed”

I am always fan of minimal loss approaches, but I wonder at (2.4), as it defines either a moment or a true parameter value that depends on the parametric family indexed by θ. Hence it does not exist outside the primary definition of said parametric family, which limits its appeal. The following construct of the empirical cdf based on the infinite sequence as providing the θ function is elegant and connect with bootstrap, but I wonder at its Bayesian justification. (I did not read Appendix C.2. in full detail but could not spot a prior on F.)

While I completely missed the resemblance, it is indeed the case that, if the predictive at each step is build from the earlier “sample”, the support is not going to evolve. However, this is not particularly exciting as the Bayesian non-parametric estimator is most rudimentary. This seems to bring us back to Rubin (1981) ?! A Dirichlet prior is mentioned with no further detail. And I am getting confused at the complete lack of structure, prior, &tc. It seems to contradict the next section:

“While the prescription of (3.1) remains a subjective task, we find it to be no more subjective than the selection of a likelihood function”

Copulas!!! Again, I am very glad to see copulas involved in the analysis. However, I remain unclear as to why Corollary 1 implies that any sequence of copulas could do the job. Further, why does the Gaussian copula appear as the default choice? What is the computing cost of the update (4.4) after k steps? Similarly (4.7) is using a very special form of copula, with independent-across-dimension increments. I am also missing a guided tour on the implementation, as it sounds explosive in book-keeping and multiplying, while relying on a single hyperparameter in (4.5.2)?

In the illustration section, the use of the galaxy dataset may fail to appeal to Radford Neal, in a spirit similar to Chopin’s & Ridgway’s call to leave the Pima Indians alone, since he delivered a passionate lecture on the inappropriateness of a mixture model for this dataset (at ICMS in 2001). I am unclear as to where the number of modes is extracted from the infinite predictive. What is $\theta$ in this case?

Copulas!!! Although I am unclear why Corollary 1 implies that any sequence of copulas does the job. And why the Gaussian copula appears as the default choice. What is the computing cost of the update (4.4) after k steps? Similarly (4.7) is using a very special form of copula, with independent-across-dimension increments. Missing a guided tour on the implementation, as it sounds explosive in book-keeping and multiplying. A single hyperparameter (4.5.2)?

## statistical aspects of climate change [discuss]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on August 4, 2022 by xi'an

As part of its annual conference in Aberdeen, Scotland, the RSS is organising a discussion meeting on two papers presented on Wednesday 14 September 2022, 5.00PM – 7.00PM (GMT+1), with free on-line registration.

Two papers will be presented:

‘Assessing present and future risk of water damage using building attributes, meteorology, and topography’ by Heinrich-Mertsching et al.​
‘The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland’ by Clarkson et al.​

“The Discussion Meeting at this year’s RSS conference in Aberdeen will feature two papers on the Statistical Aspects of Climate Change. The Discussion Meetings Committee chose this topic area motivated by the UN Climate Change Conference (COP26) held in Glasgow last year and because climate changes and the environment is one of the RSS’s six current campaigning priorities for 2022.

You are welcome to listen to the speakers and join in the discussion of the papers which follows the presentations. All the proceedings will be published in a forthcoming issue of Journal of the Royal Statistical Society, Series C (Applied Statistics) .”

Dr Shirley Coleman, Chair and Honorary Officer for Discussion Meetings

Posted in Books, Statistics, Travel, University life with tags , , , , , on April 4, 2022 by xi'an

The Royal Statistical Society is launching a series of discussions linked with the UK Government handling of the COVID-19 pandemic (and of the related data):

• Communication during the pandemic: Data, statistical analyses and modelling, 5 April
Organising panel of David Spiegelhalter, Tom Chivers and Jen Rogers
Register for the in-person or online event
• Governments’ statistical resources, 3 May
Organising panel of Simon Briscoe and Gavin Freeguard
• Evidence and policy, 21 June
Organising panel of Sylvia Richardson, Dani De Angelis and John Aston
• Evaluation, 12 July
Organising panel of Sheila Bird, Christl Donnelly and Max Parmar.

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , , on March 21, 2022 by xi'an

## Scrapping Covid surveillance study would put public health at risk [by Silvia Richardson]

Posted in Books, Statistics, University life with tags , , , , , , , , , on February 28, 2022 by xi'an

Royal Statistical Society president (and very dear friend) Sylvia Richardson published this tribune in the Guardian defending the preservation of a national surveillance system last week:

Sajid Javid is right to argue against scrapping the Office for National Statistics’ Covid surveillance study. Throughout the pandemic, national surveillance studies have provided invaluable information to support decision-making.

For any real-time health surveillance system to be reliable and cost-effective, it cannot rely solely on self-reported tests. These data sets are likely to be biased, as it is impossible to know how many people are also reporting their negative results and, if tests start to come with a cost, how many people simply aren’t testing. If we are to get reliable information about the prevalence of Covid, it is essential to maintain studies such as the ONS’s and React to allow statisticians to estimate infectiousness and the proportion of the population who are infected (including those without symptoms), as well as to identify new variants.

Abrupt disruption of a surveillance system is wasteful, will make tracking of prevalence meaningless and will put in jeopardy the future health of the public. If important surveillance studies must be scaled down, this cannot be led by arbitrary cost-cutting targets, but should be led by statisticians to ensure that studies continue to provide reliable information.