At this stage I’m just interested to see if it is possible to come up with any Bayesian estimator of c (let’s see if it is in fact possible before worrying about doing it efficiently).

Just thinking about this problem may help us understand the limits of Bayesian inference. As far as I can tell the question of whether there’s a Bayesian estimator for c is still open: there’s a significant technical problem (as you point out, we need a likelihood in which c is a parameter), but as far as I can tell there’s no proof that it can’t be done.

Importance sampling for c is basically just estimating a ratio from a set of samples, and it surprises me that something apparently so simple may be beyond the expressive power of Bayesian inference. Maybe I should be more reserved in advocating Bayesian methods to my colleagues and students!

Anyway, thanks for your time and thoughtful comments — I find your blog, your articles and your books very interesting.

]]>Thanks, Mark, for the additional comments. I am still unsure about your suggestion: using a discretization via balls is going against the notion that we know everything but c (and hence c in a way!) about the problem.

So using this approach or a Bayesian non-parametric approach to estimating f sounds like throwing away information. My gut feeling is that this paradox centres on calling c a parameter (of the sample), while it is not.

]]>Sorry to keep bothering you with this, but couldn’t we estimate (at least in principle) a likelihood P(r | c) (where r is an importance sampling run and c is the partition function) using techniques like the ones you use in ABC samplers? That is, approximate by putting epsilon balls around r and c? (I’m not sure what the right metric on r space would be, but you’re the expert here!). Then we could at least formally set up a Bayesian inference for c (even if we’re still unsure of what it would mean).

]]>I understand your qualms: what could the relevant probability distributions possibly be over? (“possible worlds” in which the laws of arithmetic are different?) But I think it really is quite parallel to the “speed of light” case: we have one or more “noisy measurements” of some unknown quantity that we want to combine in some way. (Can’t we think of an importance sampling run as a noisy measurement of c?)

Imagine we set estimating c using annealed importance sampling as a homework problem for a group of students, and we’d like to combine their answers to arrive at an even more accurate estimate of c. But each student used a different reference distribution, a different number of samples, a different annealing schedule, etc., so a simple average doesn’t make sense; what would a better method be?

There’s a technical problem for even an “in principle” Bayesian estimator for c that I don’t see a way around: as you point out, we can estimate P(c | r) where c is the partition function and r is an importance sampler run, but for Bayesian estimation we need a likelihood P(r | c), and I don’t see how to get this. (We can estimate P(r, c) from multiple importance sampler runs; could we use this to estimate P(r | c)? Of course c is a deterministic function of r so our estimate of P(r | c) would be a mixture of delta functions …)

]]>Argh, Mark, you are setting the debate back to stage one: I though it was more or less settled (?!) that there is no such thing as a likelihood on the normalising constant?! Actually, in the previous post, I also pointed out that the numerical approximations of the constant are not estimates in the usual sense, since there is no unknown parameter. (Thanks for the comments, eh!)

]]>And once we have likelihoods, don’t we have Bayesian inference for c?

]]>thanks Larry you got me stuck for maybe one minute..then I went running in the early morning and realised *this was just the same thing*: when using a stochastic approximation to the integral c,

say, the distribution of δ is given and known, at least formally, so I can exploit (on principle) this distribution to assess the variability of my evaluation δ. Obviously, in practice, the variance is also as unavailable as c,

and has itself to be approximated, leading to a sort of infinite regress. However, it is indeed *the same thing* (to me).

After you get your simulation-based point estimate of c,

how do you assess your uncertainty?

Is there a posterior for c?

Or a confidence interval?

Or something else?

Larry

]]>