Robins and Wasserman
As I attended Jamie Robins’ session in Varanasi and did not have a clear enough idea of the Robbins and Wasserman paradox to discuss it viva vocce, here are my thoughts after reading Larry’s summary. My first reaction was to question whether or not this was a Bayesian statistical problem (meaning why should I be concered with the problem). Just as the normalising constant problem was not a statistical problem. We are estimating an integral given some censored realisations of a binomial depending on a covariate through an unknown function θ(x). There is not much of a parameter. However, the way Jamie presented it thru clinical trials made the problem sound definitely statistical. So end of the silly objection. My second step is to consider the very point of estimating the entire function (or infinite dimensional parameter) θ(x) when only the integral ψ is of interest. This is presumably the reason why the Bayesian approach fails as it sounds difficult to consistently estimate θ(x) under censored binomial observations, while ψ can be. Of course, if we want to estimate the probability of a success like ψ going through functional estimation this sounds like overshooting. But the Bayesian modelling of the problem appears to require considering all unknowns at once, including the function θ(x) and cannot forget about it. We encountered a somewhat similar problem with Jean-Michel Marin when working on the k-nearest neighbour classification problem. Considering all the points in the testing sample altogether as unknowns would dwarf the training sample and its information content to produce very poor inference. And so we ended up dealing with one point at a time after harsh and intense discussions! Now, back to the Robins and Wasserman paradox, I see no problem in acknowledging a classical Bayesian approach cannot produce a convergent estimate of the integral ψ. Simply because the classical Bayesian approach is an holistic system that cannot remove information to process a subset of the original problem. Call it the curse of marginalisation. Now, on a practical basis, would there be ways of running simulations of the missing Y’s when π(x) is known in order to achieve estimates of ψ? Presumably, but they would end up with a frequentist validation…
January 17, 2013 at 2:30 pm
Actually, I think your first assessment was right. Many Frequentists think the defining characteristic of Bayesians is that they use Bayes theorem. Obviously though the defining characteristic is an interpretation of the meaning of probability.
In this problem the Frequentist views Psi and Theta as physics facts (frequencies) to be estimated. A Bayesian views them as probability distributions to be derived from a state of knowledge.
Theta(x) is really P(1|x). To find it a Bayesian wouldn’t put a prior on the likelihood Theta and then use Bayes theorem, they would find a state of knowledge K, calculate P(1|x,K) and determine everything from there. As in the other Wasserman problem you mention, there may be practical difficulties in stating K explicitly and translating it into a probability distribution, but they are only practical difficulties, and not difficulties of principle.
Similarly for a Bayesian, Psi is a theoretical quantity to be derived. When a Frequentist says they want to estimate it what they mean intuitively is they want to estimate the frequency in which Y’s=1. Such a frequency is a real physical fact and Bayesians can estimate it, but not in the bizarre way Wasserman suggests. The way to do it is to first define the real physical quantities you want to estimate: let F=\sum Y_i/n and then calculate the expected value of f based on the data x and state of knowledge K. Note this can be done using Y_i’s in the data x or for some future sample (only some of the Y’s in the current sample are known from data x). If you want to know how accurate this estimate is then just calculate the variance of F.
A Frequentist who doesn’t quite get Bayesianism can generate paradoxes like this all day. All they have to do is proceed in a problem using their frequency interpretation of the probability distributions, whip out Bayes theorem at some inappropriate moment and then claim Bayesian Statistics is all screwed up.
January 17, 2013 at 1:58 pm
I like the name: curse of marginalization.
Small point: it is Robins not Robbins
–Larry
January 17, 2013 at 11:47 pm
oh dear!!! sorry for the mispelling, blame jetlag…