## BFF⁷ postponed

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 31, 2020 by xi'an

## back to Ockham’s razor

Posted in Statistics with tags , , , , , , , , , on July 31, 2019 by xi'an

“All in all, the Bayesian argument for selecting the MAP model as the single ‘best’ model is suggestive but not compelling.”

Last month, Jonty Rougier and Carey Priebe arXived a paper on Ockham’s factor, with a generalisation of a prior distribution acting as a regulariser, R(θ). Calling on the late David MacKay to argue that the evidence involves the correct penalising factor although they acknowledge that his central argument is not absolutely convincing, being based on a first-order Laplace approximation to the posterior distribution and hence “dubious”. The current approach stems from the candidate’s formula that is already at the core of Sid Chib’s method. The log evidence then decomposes as the sum of the maximum log-likelihood minus the log of the posterior-to-prior ratio at the MAP estimator. Called the flexibility.

“Defining model complexity as flexibility unifies the Bayesian and Frequentist justifications for selecting a single model by maximizing the evidence.”

While they bring forward rational arguments to consider this as a measure model complexity, it remains at an informal level in that other functions of this ratio could be used as well. This is especially hard to accept by non-Bayesians in that it (seriously) depends on the choice of the prior distribution, as all transforms of the evidence would. I am thus skeptical about the reception of the argument by frequentists…

## O’Bayes 2019 conference program

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on May 13, 2019 by xi'an

The full and definitive program of the O’Bayes 2019 conference in Warwick is now on line. Including discussants for all papers. And the three [and free] tutorials on Friday afternoon, 28 June, on model selection (M. Barbieri), MCMC recent advances (G.O. Roberts) and BART (E.I. George). Registration remains open at the reduced rate and submissions of posters can still be sent to me for all conference participants.

## double yolk priors [a reply from the authors]

Posted in Books, Statistics, University life with tags , , , , , on March 14, 2018 by xi'an

[Here is an email I received from Subhadeep Mukhopadhyay, one of the authors of the paper I discussed yesterday.}
Thank for discussing our work. Let me clarify the technical point that you raised:
– The difference between Legj(u)_j and Tj=Legj(G(θ)). One is orthonormal polyn of L2[0,1] and the other one is L2[G]. The second one is poly of rank-transform G(θ).
– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.
-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in $T_j$, in eq (2.2), as it lives in the Hilbert space L2(G)
– For your last point on Step 2 of our algo, we can also use the simple integrate command.
-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.
Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.

## double yolk priors

Posted in Statistics with tags , , , , on March 13, 2018 by xi'an

“To develop a “defendable and defensible” Bayesian learning model, we have to go beyond blindly ‘turning the crank’ based on a “go-as-you-like” [approximate guess] prior. A lackluster attitude towards prior modeling could lead to disastrous inference, impacting various fields from clinical drug development to presidential election forecasts. The real questions are: How can we uncover the blind spots of the conventional wisdom-based prior? How can we develop the science of prior model-building that combines both data and science [DS-prior] in a testable manner – a double-yolk Bayesian egg?”

I came through R bloggers on this presentation of a paper by Subhadeep Mukhopadhyay and Douglas Fletcher, Bayesian modelling via goodness of fit, that aims at solving all existing problems with classical Bayesian solutions, apparently! (With also apparently no awareness of David Spiegelhalter’s take on the matter.) As illustrated by both quotes, above and below:

“The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis work ow that is more effective than either of the two separately.”

(I wonder who else in this Universe would characterise “modern Bayesian statistics” in such a non-Bayesian way! And love the notion of distillation applied to priors!) The setup is actually one of empirical Bayes inference where repeated values of the parameter θ drawn from the prior are behind independent observations. Which is not the usual framework for a statistical analysis, where a single value of the parameter is supposed to hide behind the data, but most convenient for frequency based arguments behind empirical Bayes methods (which is the case here). The paper adopts a far-from-modern discourse on the “truth” of “the” prior… (Which is always conjugate in that Universe!) Instead of recognising the relativity of a statistical analysis based on a given prior.

When I tried to read the paper any further, I hit a wall as I could not understand the principle described therein. And how it “consolidates Bayes and frequentist, parametric and nonparametric, subjective and objective, quantile and information-theoretic philosophies.”. Presumably the lack of oxygen at the altitude of Chamonix…. Given an “initial guess” at the prior, g, a conjugate prior (in dimension one with an invertible cdf), a family of priors is created in what first looks like a form of non-parametric exponential tilting of g. But a closer look [at (2.1)] exposes the “family” as the tautological π(θ)=g(θ)x π(θ)/g(θ). The ratio is expanded into a Legendre polynomial series. Which use in Bayesian statistics dates a wee bit further back than indicated in the paper (see, e.g., Friedman, 1985; Diaconis, 1986). With the side issue that the resulting approximation does not integrate to one. Another side issue is that the coefficients of the Legendre truncated series are approximated by simulations from the prior [Step 3 of the Type II algorithm], rarely an efficient approach to the posterior.