## selected parameters from observations

Posted in Books, Statistics with tags , , , , , , , on December 7, 2018 by xi'an

I recently read a fairly interesting paper by Daniel Yekutieli on a Bayesian perspective for parameters selected after viewing the data, published in Series B in 2012. (Disclaimer: I was not involved in processing this paper!)

The first example is to differentiate the Normal-Normal mean posterior when θ is N(0,1) and x is N(θ,1) from the restricted posterior when θ is N(0,1) and x is N(θ,1) truncated to (0,∞). By restating the later as the repeated generation from the joint until x>0. This does not sound particularly controversial, except for the notion of selecting the parameter after viewing the data. That the posterior support may depend on the data is not that surprising..!

“The observation that selection affects Bayesian inference carries the important implication that in Bayesian analysis of large data sets, for each potential parameter, it is necessary to explicitly specify a selection rule that determines when inference  is provided for the parameter and provide inference that is based on the selection-adjusted posterior distribution of the parameter.” (p.31)

The more interesting distinction is between “fixed” and “random” parameters (Section 2.1), which separate cases where the data is from a truncated distribution (given the parameter) and cases where the joint distribution is truncated but misses the normalising constant (function of θ) for the truncated sampling distribution. The “mixed” case introduces an hyperparameter λ and the normalising constant integrates out θ and depends on λ. Which amounts to switching to another (marginal) prior on θ. This is quite interesting even though one can debate of the very notions of “random” and “mixed” “parameters”, which are those where the posterior most often changes, as true parameters. Take for instance Stephen Senn’s example (p.6) of the mean associated with the largest observation in a Normal mean sample, with distinct means. When accounting for the distribution of the largest variate, this random variable is no longer a Normal variate with a single unknown mean but it instead depends on all the means of the sample. Speaking of the largest observation mean is therefore misleading in that it is neither the mean of the largest observation, nor a parameter per se since the index [of the largest observation] is a random variable induced by the observed sample.

In conclusion, a very original article, if difficult to assess as it can be argued that selection models other than the “random” case result from an intentional modelling choice of the joint distribution.

## Large-scale Inference

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , on February 24, 2012 by xi'an

Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can be classified as textbooks [even though Efron’s has exercises], they differ very much in their intended audience and their purpose. As I wrote in the review of Principles of Applied Statistics, the book has an encompassing scope with the goal of covering all the methodological steps  required by a statistical study. In Large-scale Inference, Efron focus on empirical Bayes methodology for large-scale inference, by which he mostly means multiple testing (rather than, say, data mining). As a result, the book is centred on mathematical statistics and is more technical. (Which does not mean it less of an exciting read!) The book was recently reviewed by Jordi Prats for Significance. Akin to the previous reviewer, and unsurprisingly, I found the book nicely written, with a wealth of R (colour!) graphs (the R programs and dataset are available on Brad Efron’s home page).

I have perhaps abused the “mono” in monograph by featuring methods from my own work of the past decade.” (p.xi)

Sadly, I cannot remember if I read my first Efron’s paper via his 1977 introduction to the Stein phenomenon with Carl Morris in Pour la Science (the French translation of Scientific American) or through his 1983 Pour la Science paper with Persi Diaconis on computer intensive methods. (I would bet on the later though.) In any case, I certainly read a lot of the Efron’s papers on the Stein phenomenon during my thesis and it was thus with great pleasure that I saw he introduced empirical Bayes notions through the Stein phenomenon (Chapter 1). It actually took me a while but I eventually (by page 90) realised that empirical Bayes was a proper subtitle to Large-Scale Inference in that the large samples were giving some weight to the validation of empirical Bayes analyses. In the sense of reducing the importance of a genuine Bayesian modelling (even though I do not see why this genuine Bayesian modelling could not be implemented in the cases covered in the book).

Large N isn’t infinity and empirical Bayes isn’t Bayes.” (p.90)

The core of Large-scale Inference is multiple testing and the empirical Bayes justification/construction of Fdr’s (false discovery rates). Efron wrote more than a dozen papers on this topic, covered in the book and building on the groundbreaking and highly cited Series B 1995 paper by Benjamini and Hochberg. (In retrospect, it should have been a Read Paper and so was made a “retrospective read paper” by the Research Section of the RSS.) Frd are essentially posterior probabilities and therefore open to empirical Bayes approximations when priors are not selected. Before reaching the concept of Fdr’s in Chapter 4, Efron goes over earlier procedures for removing multiple testing biases. As shown by a section title (“Is FDR Control “Hypothesis Testing”?”, p.58), one major point in the book is that an Fdr is more of an estimation procedure than a significance-testing object. (This is not a surprise from a Bayesian perspective since the posterior probability is an estimate as well.)

Scientific applications of single-test theory most often suppose, or hope for rejection of the null hypothesis (…) Large-scale studies are usually carried out with the expectation that most of the N cases will accept the null hypothesis.” (p.89)

On the innovations proposed by Efron and described in Large-scale Inference, I particularly enjoyed the notions of local Fdrs in Chapter 5 (essentially pluggin posterior probabilities that a given observation stems from the null component of the mixture) and of the (Bayesian) improvement brought by empirical null estimation in Chapter 6 (“not something one estimates in classical hypothesis testing”, p.97) and the explanation for the inaccuracy of the bootstrap (which “stems from a simpler cause”, p.139), but found less crystal-clear the empirical evaluation of the accuracy of Fdr estimates (Chapter 7, ‘independence is only a dream”, p.113), maybe in relation with my early career inability to explain Morris’s (1983) correction for empirical Bayes confidence intervals (pp. 12-13). I also discovered the notion of enrichment in Chapter 9, with permutation tests resembling some low-key bootstrap, and multiclass models in Chapter 10, which appear as if they could benefit from a hierarchical Bayes perspective. The last chapter happily concludes with one of my preferred stories, namely the missing species problem (on which I hope to work this very Spring).