The Ethel Newbold Prize is to be awarded biannually to an outstanding statistical scientist for a body of work that represents excellence in research in mathematical statistics, and/or excellence in research that links developments in a substantive field to new advances in statistics. In any year in which the award is due, the prize will not be awarded unless the set of all nominations includes candidates from both genders.

and is funded by Wiley. I support very much this (inclusive) approach of “recognizing the importance of women in statistics”, without creating a prize restricted to women nominees (and hence exclusive). Thanks to the members of the Program Committee of the Bernoulli Society for setting that prize and to Nancy Reid in particular.

Ethel Newbold was a British statistician who worked during WWI in the Ministry of Munitions and then became a member of the newly created Medical Research Council, working on medical and industrial studies. She was the first woman to receive the Guy Medal in Silver in 1928. Just to stress that much remains to be done towards gender balance, the second and last woman to get a Guy Medal in Silver is Sylvia Richardson, in 2009… (In addition, Valerie Isham, Nicky Best, and Fiona Steele got a Guy Medal in Bronze, out of the 71 so far awarded, while no woman ever got a Guy Medal in Gold.) Funny occurrences of coincidence: Ethel May Newbold was educated at Tunbridge Wells, the place where Bayes was a minister, while Sylvia is now head of the Medical Research Council biostatistics unit in Cambridge.

Filed under: Books, Kids, Statistics, University life Tagged: Bayesian non-parametrics, Bernoulli society, Brazil, Cambridge University, compound Poisson distribution, England, Ethel Newbold, Guy Medal, industrial statistics, ISI, Medical Research Council, Rio de Janeiro, Royal Statistical Society, Turnbridge Wells ]]>

Filed under: Books, pictures, Statistics, University life Tagged: adaptive importance sampling, Fondation Sciences Mathématiques de Paris, Langevin MCMC algorithm, Leipzig, population Monte Carlo, sequential Monte Carlo, Université Paris Dauphine ]]>

**I**n this review paper, now published in *Statistical Analysis and Data Mining* 6, 3 (2013), David Parkinson and Andrew R. Liddle go over the (Bayesian) model selection and model averaging perspectives. Their argument in favour of model averaging is that model selection via Bayes factors may simply be too inconclusive to favour one model and only one model. While this is a correct perspective, this is about it for the theoretical background provided therein. The authors then move to the computational aspects and the first difficulty is their approximation (6) to the evidence

where they average the *likelihood x prior* terms over simulations from the posterior, which does not provide a valid (either unbiased or converging) approximation. They surprisingly fail to account for the huge statistical literature on evidence and Bayes factor approximation, incl. Chen, Shao and Ibrahim (2000). Which covers earlier developments like bridge sampling (Gelman and Meng, 1998).

As often the case in astrophysics, at least since 2007, the authors’ description of nested sampling drifts away from perceiving it as a regular Monte Carlo technique, with the same convergence speed n^{1/2} as other Monte Carlo techniques and the same dependence on dimension. It is certainly not the only simulation method where the produced “samples, as well as contributing to the evidence integral, can also be used as posterior samples.” The authors then move to “population Monte Carlo [which] is an adaptive form of importance sampling designed to give a good estimate of the evidence”, a particularly restrictive description of a generic adaptive importance sampling method (Cappé et al., 2004). The approximation of the evidence (9) based on PMC also seems invalid:

is missing the prior in the numerator. (The switch from θ in Section 3.1 to X in Section 3.4 is confusing.) Further, the sentence “PMC gives an unbiased estimator of the evidence in a very small number of such iterations” is misleading in that PMC is unbiased at each iteration. Reversible jump is not described at all (the supposedly higher efficiency of this algorithm is far from guaranteed when facing a small number of models, which is the case here, since the moves between models are governed by a random walk and the acceptance probabilities can be quite low).

The second quite unrelated part of the paper covers published applications in astrophysics. Unrelated because the three different methods exposed in the first part are not compared on the same dataset. Model averaging is obviously based on a computational device that explores the posteriors of the different models under comparison (or, rather, averaging), however no recommendation is found in the paper as to efficiently implement the averaging or anything of the kind. In conclusion, I thus find this review somehow anticlimactic.

Filed under: Books, Statistics, University life Tagged: adaptive importance sampling, Astrophysics, Bayes factor, bridge sampling, computational statistics, evidence, likelihood, model averaging, Monte Carlo technique, population Monte Carlo, statistical analysis and data mining ]]>

Since the issue was covered in so many places, I just spent one hour or so constructing a basic solution à la Fibonacci and then tried to improve it against a length criterion. Here are my R codes (using the numbers library):

osiris=function(a,b){ #can the fraction a/b be simplified diva=primeFactors(a) divb=primeFactors(b) divc=c(unique(diva),unique(divb)) while (sum(duplicated(divc))>0){ n=divc[duplicated(divc)] for (i in n){a=div(a,i);b=div(b,i)} diva=primeFactors(a) divb=primeFactors(b) divc=c(unique(diva),unique(divb)) } return(list(a=a,b=b)) }

presumably superfluous for simplifying fractions

horus=function(a,b,teth=NULL){ #simplification anubis=osiris(a,b) a=anubis$a;b=anubis$b #decomposition by removing 1/b isis=NULL if (!(b %in% teth)){ a=a-1 isis=c(isis,b) teth=c(teth,b)} if (a>0){ #simplification anubis=osiris(a,b) bet=b;a=anubis$a;b=anubis$b if (bet>b){ isis=c(isis,horus(a,b,teth))}else{ # find largest integer k=ceiling(b/a) while (k %in% teth) k=k+1 a=k*a-b b=k*b isis=c(isis,k,horus(a,b,teth=c(teth,k))) }} return(isis)}

which produces a Fibonacci solution (with the additional inclusion of the original denominator) and

nut=20 seth=function(a,b,isis=NULL){ #simplification anubis=osiris(a,b) a=anubis$a;b=anubis$b if ((a==1)&(!(b %in% isis))){isis=c(isis,b)}else{ ra=hapy=ceiling(b/a) if (max(a,b)<1e5) hapy=horus(a,b,teth=isis) k=unique(c(hapy,ceiling(ra/runif(nut,min=.1,max=1)))) propa=propb=propc=propd=rep(NaN,le=length((k %in% isis))) bastet=1 for (i in k[!(k %in% isis)]){ propa[bastet]=i*a-b propb[bastet]=i*b propc[bastet]=i propd[bastet]=length(horus(i*a-b,i*b,teth=c(isis,i))) bastet=bastet+1 } k=propc[order(propd)[1]] isis=seth(k*a-b,k*b,isis=c(isis,k)) } return(isis)}

which compares solutions against their lengths. When calling those functions for the three fractions above the solutions are

> seth(2,5) [1] 15 3 > seth(5,12) [1] 12 3 > seth(50,77) [1] 2 154 7

with no pretension whatsoever to return anything optimal (and with some like crashes when the magnitude of the entries grows, try for instance 5/121). For this latest counter-example, the alternative horus works quite superbly:

> horus(5,121) [1] 121 31 3751 1876 7036876

Filed under: Books, Kids, R Tagged: Egyptian fractions, Fibonacci, greedy algorithm, Le Monde, Liber Abaci, mathematical puzzle, numerics, Rhind papyrus ]]>

“As the cosmological data continues to improve with its inevitable twists, it has become evident that whatever the observations turn out to be they will be lauded as \proof of inflation”.”G. Gubitosi et al.

**I**n an arXive with the above title, Gubitosi et al. embark upon a generic and critical [and astrostatistical] evaluation of Bayesian evidence and the Bayesian paradigm. Perfect topic and material for another blog post!

“Part of the problem stems from the widespread use of the concept of Bayesian evidence and the Bayes factor (…) The limitations of the existing formalism emerge, however, as soon as we insist on falsifiability as a pre-requisite for a scientific theory (….) the concept is more suited to playing the lottery than to enforcing falsifiability: winning is more important than being predictive.”G. Gubitosi et al.

It is somehow quite hard *not* to quote most of the paper, because prose such as the above abounds. Now, compared with standards, the authors introduce an higher level than models, called *paradigms*, as collections of models. (I wonder what is the next level, monads? universes? paradises?) Each paradigm is associated with a marginal likelihood, obtained by integrating over models and model parameters. Which is also the evidence of or for the paradigm. And then, assuming a prior on the paradigms, one can compute the posterior over the paradigms… What is the novelty, then, that “forces” falsifiability upon Bayesian testing (or the reverse)?!

“However, science is not about playing the lottery and winning, but falsifiability instead, that is, about winning given that you have bore the full brunt of potential loss, by taking full chances of not winning a priori. This is not well incorporated into the Bayesian evidence because the framework is designed for other ends, those of model selection rather than paradigm evaluation.”G. Gubitosi et al.

The paper starts by a criticism of the Bayes factor in the point null test of a Gaussian mean, as overly penalising the null against the alternative being only a power law. Not much new there, it is well known that the Bayes factor does not converge at the same speed under the null and under the alternative… The first proposal of those authors is to consider the distribution of the marginal likelihood of the null model under the [or a] prior predictive encompassing both hypotheses or only the alternative *[there is a lack of precision at this stage of the paper]*, in order to calibrate the observed value against the expected. What is the connection with falsifiability? The notion that, under the prior predictive, most of the mass is on very low values of the evidence, leading to concluding against the null. If replacing the null with the alternative marginal likelihood, its mass then becomes concentrated on the largest values of the evidence, which is translated as an *unfalsifiable* theory. In simpler terms, it means you can never prove a mean θ is different from zero. Not a tremendously item of news, all things considered…

“…we can measure the predictivity of a model (or paradigm) by examining the distribution of the Bayesian evidence assuming uniformly distributed data.”G. Gubitosi et al.

The alternative is to define a tail probability for the evidence, i.e. the probability to be below an arbitrarily set bound. What remains unclear to me in this notion is the definition of a prior on the data, as it seems to be model *dependent*, hence prohibits comparison between models since this would involve incompatible priors. The paper goes further into that direction by penalising models according to their predictability, P, as exp{-(1-P²)/P²}. And paradigms as well.

“(…) theoretical matters may end up being far more relevant than any probabilistic issues, of whatever nature. The fact that inflation is not an unavoidable part of any quantum gravity framework may prove to be its greatest undoing.”G. Gubitosi et al.

Establishing a principled way to weight models would certainly be a major step in the validation of posterior probabilities as a quantitative tool for Bayesian inference, as hinted at in my 1993 paper on the Lindley-Jeffreys paradox, but I do not see such a principle emerging from the paper. Not only because of the arbitrariness in constructing both the predictivity and the associated prior weight, but also because of the impossibility to define a joint predictive, that is a predictive across models, without including the weights of those models. This makes the prior probabilities appearing on “both sides” of the defining equation… (And I will not mention the issues of constructing a prior distribution of a Bayes factor that are related to Aitkin‘s integrated likelihood. And won’t obviously try to enter the cosmological debate about inflation.)

Filed under: Books, pictures, Statistics, University life Tagged: astrostatistics, Bayes factor, Bayesian model choice, Bayesian paradigm, Ewan Cameron, Gottfried Leibnitz, Imperial College London, inflation, Karl Popper, monad, paradigm shift, Peter Coles, quantum gravity ]]>

Filed under: Books, Mountains, pictures, Travel Tagged: Arnaldur Indridason, Íslendingabók, book review, deCODE, Iceland, Iceland noir, Jar City, Keflavik, Mýrin, Norðurmýri, Reykjavik ]]>

Filed under: pictures, Running, Travel Tagged: Hudson river, New York city, night sky, river, waterfront ]]>

- Weighted ABC: a new strategy for cluster strong lensing cosmology with simulations, by Madhura Killedar et al.
*[Madhura won one of the three prizes at the BAYESM meeting last year]*:

*“We investigate the uncertainty in the calculated likelihood,and consequential ability to compare competing cosmologies…”* - Inflation, evidence and falsifiability, by Giulia Gubitosi et al.:

“By considering toy models we illustrate how unfalsifiable models and paradigms are always favoured by the Bayes factor…”

- Bayesian model selection without evidences: application to the dark energy equation-of-state, by Sonke Hee et al.:

*“A method is presented for Bayesian model selection without explicitly computing evidences … without the need for reversible jump MCMC techniques.”*

Filed under: pictures, Statistics, University life Tagged: ABC, ABC-MCMC, arXiv, astrostatistics, Bayes factor, Bayesian model selection, BAYSM 2014, dark energy, evidence, Ewan Cameron, falsification, reversible jump MCMC, Vienna ]]>

A pocket calculator with ten keys (0,1,…,9) starts with a random digit n between 0 and 9. A number on the screen can then be modified into another number by two rules:

1. pressing k changes the k-th digit v whenever it exists into (v+1)(v+2) where addition is modulo 10;

2. pressing 0k deletes the (k-1)th and (k+1)th digits if they both exist and are identical (otherwise nothing happens.

Which 9-digit numbers can always be produced whatever the initial digit?

I did not find an easy entry to this puzzle, in particular because it did not state what to do once 9 digits had been reached: would the extra digits disappear? But then, those to the left or to the right? The description also fails to explain how to handle n=000 000 004 versus n=4.

Instead, I tried to look at the numbers with less than 7 digits that could appear, using some extra rules of my own like preventing numbers with more than 9 digits. Rules which resulted in a sure stopping rule when applying both rules above at random:

leplein=rep(0,1e6) for (v in 1:1e6){ x=as.vector(sample(1:9,1)) for (t in 1:1e5){ k=length(x) #as sequence of digits if (k<3){ i=sample(rep(1:k,2),1) x[i]=(x[i]+1)%%10 y=c(x[1:i],(x[i]+1)%%10) if (i<k){ x=c(y,x[(i+1):k])}else{ x=y} }else{ prop1=prop2=NULL difs=(2:(k-1))[abs(x[-(1:2)]-x[-((k-1):k)])==0] if (length(difs)>0) prop1=sample(rep(difs,2),1) if (k<9) prop2=sample(rep(1:k,2),1) if (length(c(prop1,prop2))>1){ if (runif(1)<.5){ x[prop2]=(x[prop2]+1)%%10 y=c(x[1:prop2],(x[prop2]+1)%%10) if (prop2<k){ x=c(y,x[(prop2+1):k])}else{ x=y} }else{ x=x[-c(prop1-1,prop1+1)]} while ((length(x)>1)&(x[1]==0)) x=x[-1]} if (length(c(prop1,prop2))==1){ if (is.null(prop2)){ x=x[-c(prop1-1,prop1+1)] }else{ x[prop2]=(x[prop2]+1)%%10 y=c(x[1:prop2],(x[prop2]+1)%%10) if (prop2<k){ x=c(y,x[(prop2+1):k]) }else{ x=y} x=c(x[1:(prop2-1)], (x[prop2]+1)%%10, (x[prop2]+2)%%10,x[(prop2+1):k])} while ((length(x)>1)&(x[1]==0)) x=x[-1]} if (length(c(prop1,prop2))==0) break() } k=length(x) if (k<7) leplein[sum(x*10^((k-1):0))]= leplein[sum(x*10^((k-1):0))]+1 }}

code that fills an occupancy table for the numbers less than a million over 10⁶ iterations. The solution as shown below (with the number of zero entries over each column) is rather surprising in that it shows an occupancy that is quite regular over a grid. While it does not answer the original question…

Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle ]]>

*“It is necessary to be vigilant to ensure that attempts to be mathematically general do not lead us to introduce absurdities into discussions of inference.” (p.8)*

**T**his new book by Michael Evans (Toronto) summarises his views on statistical evidence (expanded in a large number of papers), which are a quite unique mix of Bayesian principles and less-Bayesian methodologies. I am quite glad I could receive a version of the book before it was published by CRC Press, thanks to Rob Carver (and Keith O’Rourke for warning me about it).* [Warning: this is a rather long review and post, so readers may chose to opt out now!]*

“The Bayes factor does not behave appropriately as a measure of belief, but it does behave appropriately as a measure of evidence.” (p.87)

First, Evans’ perspective on continuous models and measurability issues is that those are only approximations of true models which can only be on finite sets. (I know this is also Keith’s perspective, so he should appreciate!) Any measure theoretic inconsistency like the Dickey-Savage paradox (central to Evans’ approach) can then be attributed to continuous features that vanish in the finite case. Easy does it! There is even an Appendix on “The definition of a density” following Rudin’s (1974) definition of densities as limits. And hence much more topological than the standard Lebesgue’s definition. Which makes those densities continuous for instance and avoid the selection of a “nice” version of the density.

“It seems clear that there is only one aspect of a statistical investigation that can ever truly be claimed to be objective, namely, the observed data (…) it can be claimed that the data are objective [when] the control over the data selection process is entirely through the random system.” (p.12)

The first chapter has a quite fascinating discussion about objectivity and subjectivity, under the heading of empirical criticism, and I tend to side with Evans’ arguments on the inherent subjectivity of statistical analyses and the need of empirically checking every aspect of those analyses. For instance, the way frequentism is handled. There is just the point made in the quote above that seems unclear, in that it implies an intrinsic belief in the model, which should be held with with the utmost suspicion! That is, the data is almost certainly unrelated with the postulated model since all models are wrong *et cetera*…

“First, randomness has nothing to do with probability. Second, there is no statistical test for randomness.” (p.49)

It is sort of getting rarer and rarer to see statistics books exposing the various concepts or meaning of probability, rather than merely presenting (not so) standard measure theory. But this is what Chapter 2 in Evans’ book does, discussing all sorts of axiomatics for defining probability. And including all sorts of paradoxes like the Monty Hall problem or the Borel paradox as examples. (Speaking of paradoxes, the Jeffreys-Lindley paradox is discussed in the next chapter and blamed on the Bayes factor and its lack of “calibration as a measure of evidence” (p.84). The book claims a resolution of the paradox on p.132 by showing confluence between the p-value and the relative belief ratio. This simply shows confluence with the p-value in my opinion.)

“It is not clear how one checks a model using the pure likelihood (…) Overall, pure likelihood theory does not lead to a fully satisfactory theory of inference.” (p.58)

“…a common misconception [seems to be] that Bayesian inferences are only based on the posterior but, with the exception of probability statements as determined by the principle of conditional probability, there is nothing to support this view.” (p.73)

After presenting the classical approaches from “pure” likelihood to p-values and Neyman-Pearson tests to Bayesian inferences (mind the *s*!), including his explanation as to why the likelihood principle (L) does not hold as a consequence of the sufficiency (S) and conditionality (C) principles (basically because the conditionality principle is not an equivalence relation), i.e.,

S**∪**C **⊂** L **⊂** S**∪**C,

including this rather puzzling quote about Bayesian inference(s)—with which I cannot agree—Michael Evans defines his own version of evidence, namely the relative belief ratio that could also be called the Savage-Dickey ratio, being the ratio of the posterior over the prior density at a specific parameter value, and he expands on the various properties of this ratio. The estimator he advocates in association with this evidence is the maximum relative belief estimator, maximising the relative belief ratio, another type of MAP then. With the same drawbacks as the MAP depends on the dominating measure and is not associated with a loss function in the continuous case. Even in the finite case, the associated loss is an indicator function divided by the prior, which sounds highly counter-intuitive.

“A [frequentist] theory of inference suffers from two main defects. First, there does not seem to be a good answer to the question of why it is necessary to consider the frequency properties of statistical procedures. Without this justification, basing statistical procedures on the principle of frequentism seems like a weak foundation (…) Second, it seems almost misleading to refer to the frequentist theory of inference because it does not exist in the sense that such a theory can be applied to statistical problems [without] a guaranteed sensible answer or, for that matter, even an answer.” (p.71)

A major surprise for me when reading the book is that Evans ends up with a solution [for assessing the *strength* of the evidence] that is [acknowledged to be] very close (or even equivalent) to Murray Aitkin’s integrated likelihood approach! An approach much discussed on the ‘Og. And in a Statistics and Risk Analysis paper with Andrew and Judith. Indeed, the strength of a relative belief ratio expressed as a ratio of marginals is a posterior p-value associated with this ratio. Which again has the drawbacks of not being defined a priori, of using the data twice, and of being defined on one model versus the other. Solving the Lindley-Jeffreys paradox—in the understanding of a clash between the Bayesian and frequentist answers—this way is then no major surprise, for this was one major argument in Murray Aitkin’s support of his approach (as seen for instance in the 1991 Read Paper).

“Currents attempts at developing a theory based upon improper priors have close connections with frequentist ideas but, as already discussed, there are issues with the principle of frequentism itself that remain unsolved.” (p.170)

Pursuing this unique mix of Bayesian and extra-Bayesian principles, Evans also acknowledges a connection with Mayo and Spanos *error*–*statistics* philosophy of science and *severity tests*, although since he relies on a marginal likelihood for this purposes, this should clash with the authors’ arguments.

“There are a variety of problems associated with improper priors. Perhaps the most obvious one is that there is no guarantee that [the marginal is finite] (…) Also, when PI is improper, it cannot be the case that m represents a probability distribution and so all applications of the prior predictive that rely on it are lost.” (p.173)

Since the ratio is defined in terms of densities, the whole approach does not allow for improper priors. A fact acknowledged in the chapter on model and prior checking. Some of the criticisms are standard, including the marginalisation paradox and the impossibility to define marginal priors. Some less, as the above that the constraint of finiteness makes the prior data dependent (!) [no it should be imposed uniformly, excluding for instance Haldane’s prior] or the conclusion of that section that “it is not clear how to measure evidence in such a context.” (p.176) Looking at the bigger picture, this pessimistic conclusion is in line with (a) the global perspective adopted in the book that everything is finite, which leaves little room or use for infinite mass priors, and (b) the general difficulty in handling improper priors in testing. Although this was precisely the reason for Murray Aitkin to introduce his integrated likelihood paradigm, close to the current proposal (p.119).

“Our preference is to approach all our inference problems using the relative belief ratio. At least a part of the motivation for this lies with a desire to avoid the prescription that, with continuous models, the Bayes factor must be based on the mixture prior as in Jeffreys’ definition.” (p.146)

The above remark is a very interesting point and one bound to appeal to critics of the mixture representation, like Andrew Gelman. However, given that the Dickey-Savage ratio is a representation of the Bayes factor in the point null case, I wonder how much of a difference this constitutes. Not mentioning the issue of defining a prior for the contingency of accepting the null. So in the end I do not see much of a difference. (In connection with the Bayes factor, and the use of the mixture prior, there is an inconsequential typo in Example 4.5.6, p.128, with a missing Dirac mass.)

“Some may argue that the sanctity of the prior is paramount, as it represents beliefs and any assessment of the suitability of the prior, or worse, attempts to modify the prior based on the results of this assessment, is incoherent.” (p.188)

I was eagerly and obviously waiting for the model choice chapter, but it somewhat failed to materialise! Chapter 5 is about model and prior checking, a sort of meta-goodness of fit check, but as far as I can see, the relative belief ratio methodology does not extend to the comparison of non-embedded models. The part on model checking is very limited and involves computing a p-value for the likelihood of a “discrepancy” statistic, conditional on the minimal sufficient statistic. Why minimal sufficient? Because otherwise the p-value would depend on the unknown parameter, or the discrepancy would have to be chosen ancillary. Which requires a certain degree of understanding about the model and excludes reasonably complex models, unless one uses only rank statistics or the like, which cannot be good for efficiency. I think the criticism therein about posterior checks rather unfortunate, because using the predictive has the strong appeal to (a) bypassing the dependence on the unknown parameter and (b) avoiding adhoqueries like the choice of discrepancies. I also have a general difficulty with using ancillaries and sufficient statistics because, even when they are non-trivial and well-identified, they remain a characteristic of the model: using those to check the model thus sounds fraught with danger.

“The developments concerning the assessment of bias and the checking of the ingredients is certainly very close to a fequentist approach. There is nothing contradictory about this, as there is no role for the posterior in these issues (…) While various approaches to Bayesian theory share features with relative belief, there are also key differences, such as not being decision based and making a sharp distinction between belief and evidence.” (p.212)

The second part on prior checking is quite original and challenging. While the above quote is somewhat imbalanced in its use of religious terms like sanctity and beliefs (!), I tend to concur with the idea that priors can be checked and compared. This is for instance the role of Bayesian robustness. Or of Bayes factors. Here, the way to identify prior-data conflict in practice and the consequence of a rejection of the prior remain vague, as it seems hard to avoid the data influencing the modification of the prior (besides the obvious, namely that identifying conflict leads to a modification). The only reasonable way is to set both a collection or family of priors and a modus vivendi for changing the prior, *before* the checking is done. But even then, assessing the coherence of the resulting construction is a huge question mark… There is however little incentive in using a marginal p-value (*m*-value?) especially when several priors are under comparison, since it requires to define a reference or preference prior. Or would it relate to a baseline model in the spirit of the revolutionary proposal of Simpson et al?

“Evidence is what causes beliefs to change and so evidence is measured by changes in belief.” (p.244)

Overall, and even though I would not advocate this approach to evidence, I find Michael Evans’ Measuring statistical evidence using relative belief a fascinating book, in that it defines a coherent and original approach to the vexing question of assessing and measuring evidence in a statistical problem. And spells out most vividly the issue of prior checking. As clear from the above, I somehow find the approach lacking in several foundational and methodological aspects, maybe the most strident one being that the approach is burdened with a number of arbitrary choices, lacking the unitarian feeling associated with a regular Bayesian decisional approach. I also wonder at the scaling features of the method, namely how it can cope with high dimensional or otherwise complex models, without going all the way to ask for an ABC version! In conclusion, the book is a great opportunity to discuss this approach and to oppose it to potential alternatives, hopefully generating incoming papers and talks.

*[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]*

Filed under: Books, Statistics, University life Tagged: ABC, Bayes factor, CHANCE, CRC Press, discrepancies, Error and Inference, improper prior, integrated likelihood, Jeffreys-Lindley paradox, Likelihood Principle, marginalisation paradoxes, model checking, model validation, Monty Hall problem, Murray Aitkin, p-value, point null hypotheses, relative belief ratio, University of Toronto ]]>