My remark is solely based on the definition found in your paper in BA, p.81-82, of the Bayesian evidence against * H*: it considers the posterior probability of the event that the posterior density is larger than the maximum posterior density under the null

I am sorry to enter in the discussion. I am very happy to see our e-value being the subject of a discussion among VIP statisticians.

If I understood, the host of this important page, Dr Robert, is against the Bayesian method because he does not agree to compute probabilities of events in the posterior density.

I would like to say only that I do not necessarily need data to compute e-values. If one gives me a density and a hypothesis, I can compute the e-value. Let E be the set of points that have densities inferior of a density of at least one point of the set that defines the hypothesis. The e-value is the probability of E! So I did not use any data, I only used the well known probability space. Our host is the most expert in MCMC that uses frequencies a very large number of times. Can I say that he is a frequentist? ]]>

(1) IGPL stands for: Interest Group in Pure and Applied Logics (as stated in Arnold’s comment). We have published two articles in the Logic Journal of the IGPL, they are: Borges and Stern (2007) and Stern and Pereira (2014).

(2) I wonder why you say that the e-value uses the data twice (?!)

(2a) The prior-posterior update, p_n(t) = c_n * p_0(t) * L(t|x_1,…x_n) , incorporates (once) the information in the likelihood function into the posterior distribution, p_n(t). Hence, a distribution p_2n = c_2n * p_n(t) * L(t,X_1,…,x_n) would indeed incorporate two times the same information. In this sense, I fully agree with you that Murray Aitken’s procedure uses the data twice. However, in the construction of the e-value, we do nothing even remotely related to this double use of the information contained in the likelihood function.

(2b) Do you say that the e-value uses the data twice just because the e-value uses a Reference density and a Prior distribution? Please note that these two distributions have very distinct roles: As in any standard Bayesian model, the Prior distribution, p_0(t) , represents the initial information available to the statistician in his or her modeling context. In contrast, the Reference density, r(t) , has a very different role: It is used to define the Surprise function, s(t) = p_n(t) / r(t) . The Reference density, r(t) , represents the standard Geometry of the parameter space (in its ground level, minimum information or maximum entropy state). Please note that most statistical models assume a standard metric for the parameter space, dl^2 = t’ G(t) t , where G(t) is a metric tensor. The e-value explicitly takes this fact into account, in order to built an invariant procedure. Once again, we are not using any data twice (no pun intended).

(2c) Do you say that the e-value uses the data twice just because the e-value is built in a two-step procedure? Indeed, the e-value comprises an Optimization Step: Finding v* = max_{t in H} s_n(t); and an Integration Step: ev(H|X) = Int_{t | s(t) <= v*} p_n(t) dt . This two-step procedure achieves a significance measure that: (i) Is fully compliant with the Likelihood Principle (as implied by Paulo's comment and, therefore, cannot be accused of using the data twice); (ii) Is fully Invariant (by re-parameterizations of the parameter space or the hypothesis set); and (iii) Has powerful compositional properties (logical rules for combining truth values). (iv) Moreover, although it was not initially conceived in the Decision Theoretic framework, the FBST can be derived from an appropriate Loss function, see Madruga et al. (2001).

(3) For further discussion: This correspondence of ours started discussing the technical similarities in the approaches for Model Choice at the papers: Kamary et al. (2014) and Lauretto et al. (2007). In your paper Robert et al. (2011), you explain some difficulties for using ABC for Model Choice – while using Bayes Factors. I believe that both of our aforementioned approaches could be used to overcome these difficulties. What do you say?

References:

– Borges, Stern (2007). The Rules of Logic Composition for the Bayesian Epistemic e-Values. Logic J.of the IGPL, 15, 5-6, 401-420.

– Lauretto, Faria, Pereira, Stern (2007). The Problem of Separate Hypotheses via Mixtures Models. AIP Conference Proceedings, 954, 268-275.

– Kamary, Mengersen, Robert and Rousseau (2014). Testing Hypotheses via a Mixture Model;

– Robert, Cornet, Marine, and Pillaif (2011). Lack of Confidence in Approximate Bayesian Computation Model Choice. PNAS, 108, 37, 15112–15117.

– Stern, Pereira (2014). Bayesian epistemic values: Focus on surprise, measure probability! Logic Journal of the IGPL, 22, 2, 236-254.

Have you discovered a student with that sheet during an exam ?

]]>