## Quadrature methods for evidence approximation

Two papers written by astronomers have been recently posted on arXiv about (new) ways to approximate evidence. Since they both perceive those approximations as some advanced form of quadrature, they are close enough that a comparison makes sense.

The paper by Rutger van Haasteren uses a Voronoi tessellation to represent the evidence as $\int f(x) dx \approx \sum f(x_i) O_i$

when the $x_i$‘s are simulated from the normalised version of $f$ and the $O_i$‘s are the associated Voronoi cells. This approximation converges (even when the $x_i$‘s are not simulated from the right distribution) but it cannot be used in practice because of the cost of the Voronoi tessellation. Instead, Rutger van Haasteren suggests using a sort of an approximate HPD region $F_t$ and its volume, $V_t$, along with an harmonic mean within the HPD region: $\int f(x) dx \approx V_t N \big/ \sum_{x_i \in F_t} 1/f(x_i)$

where $N$ is the total number of simulations. So in the end this solution is actually the one proposed in our paper with Darren Wraith, as described in this earlier post! It is thus nice to see an application of this idea in a realistic situation, with performances that compare with nested sampling in its MultiNest version of Feroz, Hobson and Bridges. (This is especially valuable when considering that nested sampling is often presented as the only solution to approximating evidence.)

The second paper by Martin Weinberg also adopt a quadrature perspective, while integrating in the Lebesgue sense rather than in the Riemann sense. This perspective applies to nested sampling even though John Skilling does not justify nested sampling that way but  Martin Weinberg also shows that the (infamous) harmonic mean estimator also is a Lebesgue-quadrature approximation. The solution proposed in the paper is a different kind of truncation on the functional values, that relates more to nested sampling and on which I hope to report more thoroughly later.

### 6 Responses to “Quadrature methods for evidence approximation”

1. Another harmonic mean approximation « Xi'an's Og Says:

[…] that is submitted to Bayesian Analysis. I have already mentioned this paper in a previous post, but I remain unconvinced of the appeal of the paper, given that it recovers the harmonic mean […]

2. Rescaling data for evidence « Xi'an's Og Says:

[…] data for evidence Following a comment by Mark Fisher, I went back to the analysis of the benchmark dataset of the galaxy radial […]

3. Mark Fisher Says:

Thanks for your response. I have a number of comments and questions.

I now see in Figure 5 (in your paper with Lee, Marin, and Mengersen) that the estimated means cannot be associated with the original data. Moreover, your statement of the prior in Example 8 (where you treat the galaxy data) is centered around zero. So there are clues in the paper! But if you continue to report the numbers in Table 3, I hope you will be more explicit about the fact that you are fitting a transformation of the galaxy data. (See my next comment/question.)

I now realize that the huge factor comes from Jacobian of the transformation. Let sd denote the standard deviation used to transform the data and let n denote the number of observations. Then the factor is sd^n. By the way, why not simply report this factor or, better yet, subtract n*log(sd) from your reported marginal likelihoods to make them comparable to Chib’s (and others who might fit the data)?

Please forgive my ignorance here, but regarding the “stardardization”, I assume you have subtracted the empirical mean and divided by the emprical standard deviation. For the empirical standard deviation, do you divide by n or (n-1)? [I’m guessing you use (n-1).] It makes some difference since (81/82)^(82/2) is roughly .60.

You say that by transforming the data, you avoid the use of “empirical Bayes priors”. I’m sure I don’t understand the import of what you say. Let me try to explain the source of my confusion. You have used the empirical mean and stardard deviation to transform the data and then adopted a Gaussian prior for \mu_j with a mean of zero and a variance of 10 \sigma^2 (with a distribution for \sigma^2). Suppose instead you did not subtract the mean of the data (for example) but instead centered the prior on the empirical mean. (This is essentially what Chib does.) In what respect is that more or less an empirically-based prior? (Similarly, I believe it’s possible to make an adjustment to the variance of the prior.) What am I missing?

Your model and Chib’s model differ (in essence) because the priors are different. For his model with three components (the one in which both means and variances free), the log of the marginal likelihood (as computed by Neal) is -226.791. For your model with three components, the log of the marginal likelihood is -103.36 – 82*log(sd) = -227.917 [assuming you divide by (n-1) to get the sd]. So the Bayes factor favors Chib’s model/prior by a factor of about 3 (assuming I’m doing things properly).

–Mark

• xi'an Says:

Mark, thanks for the further comments.

Adding the further factor $n\log(s)$ from the reported marginal likelihood does not appear to be exact: there is indeed such a term appearing in the log-marginal likelihood, but the change of scale also impacts the integral against the parameters. This is indeed the whole issue with the determination of the prior, before or after standardisation. And I agree that both solutions are two versions of empirical Bayes techniques… No one is more nor less empirical than the other. (I am using R sd function so I indeed divide by (n-1).)

I am quite reluctant to use a Bayes factor to compare priors on the same model. I have seen this done in several papers but this does not sound either right or Bayesian to me… Especially when using empirical Bayes solutions: comparing marginal likelihoods always favours the prior highly concentrated around the mle (I think).

4. xi'an Says:

This is correct, there is a large difference between Chib’s and Neal’s marginal likelihoods and ours. This is due to the use of a standardised version of the Galaxy dataset that avoids the use of empirical Bayes priors… The use of a brute force solution as in Radford’s paper give the same numerical value as the averaged Chib’s solution. Thanks for pointing out this discrepancy.

5. Mark Fisher Says:

I came across papers of yours that compute the marginal likelihood for the galaxy data that Chib (1995) used as an example. I’m puzzled by the enormous difference in the numbers he presents in his Table 4 (which were largely confirmed by Neal) and the numbers you report in Table 1 of your paper with Marin. For example, for k = 3 (for a model with three unknown means and variances) you report -103.35 for the log of the marginal likelihood, while the comparable number from Chib is -224.138 (or as reported by Neal -228.608 + log(3!) = -226.8). If I’m interpreting the numbers correctly, the ratio is more than 10^50 (after exponentiating). I must be missing something.

–Mark

This site uses Akismet to reduce spam. Learn how your comment data is processed.