True, you first have to replace the version of the density with something like a continuous function, in which case the maximum is uniquely defined. You could also look at it from a simulation point of view, in which case the result depends on the random sequence but not on the version of the density, I believe.

]]>With that “argument” I was trying to address just the

measure theoretic issue of the MAP depending on a

particular version of the posterior density.

In other words, the maximum of a particular smooth

version of the density may (?) approximate a number

(the posterior expectation) which does not depend on

a particular version of the density, and is a legitimate

Bayes estimate

Of course, the invariance with respect to reparameterizations

issue is left untouched.

Regards.

]]>Hi Paulo, the argument is alas rather weak because for instance it does not work well under a change of variable, ie a reparameterisation… The best argument is that it is a weighted maximum likelihood estimate or, from a computational viewpoint, it is a first degree approximation. But the connection with the mean may be very very weak…

Cheers, Xi’an

Yesterday a friend of mine (he is computer scientist) asked me

about the justification of the MAP (which he uses a lot) as a

valid Bayes estimate.

I improvised the following “justification” (!):

“… under suitable regularity conditions, the MAP, evaluated with

the help of a particular smooth version of the posterior density,

approximates the posterior mean when the sample size increases,

so …”

Can you buy that? Or is it too nasty?

Cheers and I hope you and your family are doing well.

Paulo C. Marques F.

]]>My feeling is that the dependence of Jeffreys’ prior on the whole density is not of the same nature than the p-value dependence on the data probability distribution, the case that stemmed Jeffreys’ famous comment. Overall, I am not utterly convinced by the “Likelihood Principle” that everything should only depend on , if is the data. Indeed, Bayesian inference is based on Bayes’ theorem and the principles of (conditional) probability, which implies the whole of , which is what I meant by “shape of the sampling density” in loose terms…

]]>Could you explain “because Jeffreysâ€™ inference only depends on the shape of the sampling density” a little more? I think it is an important point.

Equivariance seems a better choice. I had seen it used with respect

to symmetry properties of estimators with respect to transformations

of the sample space, but not in the case of transformations of

the parameter space (so called reparameterizations). I will check

your reference.

Thank you!

]]>I do not think there can be an agreement on the choice of the reference prior, ever. Not only for the reason you mention, namely the incompatibility of the Jeffreys prior with the likelihood principle. I however disagree with the statement that “our inference statements depend on sample space points which have not been observed”, because Jeffreys’ inference only depends on the shape of the sampling density, which is not the same (and not a drawback to me).

Note that Lehmann and Casella (as well as The Bayesian Choice) use the term *equivariance* instead of *invariance*.

that can be understood as Bayes estimates — such as the posterior

mean — may also exhibit the same lack of invariance issues of the

MAP.

(By the way, invariance is not the best word here. It would

be great to say “co-variance”, but of course the word has

another well established meaning for us statisticians.)

On the existence of intrinsic loss functions, when I was studying

Jeffreys book (TOP) — since I have a background in differential

geometry — I tried to push the geometric interpretation of his

prior as far as I could. Summarizing, what I know about it is:

1. The parametric model becomes a manifold.

2. Each parameterization we normally use is just a possible local

chart of that manifold.

3. Starting from the Kullback-Leibler and Hellinger distances, Jeffreys

induced a riemannian metric on the manifold.

4. From this metric we get a natural notion of “volume”, given by

. The full Jeffreys prior is then a measure

which is intrinsically uniform on the manifold. Invariance

(co-variance) of the expression of the prior in particular

parameterizations is automatic.

5. A loss function given, for example, by the square of the geodesic

distance between two points of the manifold (which, in many cases,

will have a nice expression in a particular chart) is one simple

way to construct an invariant (co-variant) Bayes estimate.

Browsing the literature, I’ve found out that these things are known

one way or another. So, what is the problem with this program of

intrinsic Bayesian inference? What I don’t like about it, is that

the metric depends on the whole structure of the sample space,

and so all of our inference statements depend on sample space

points which have not been observed.

About “how good is this g_{ij}?”, there is a (very difficult

to understand) Theorem of Chentsov which says that this

“information metric” is the only one satisfying a reasonable

desiderata.

So, despite the mathematical beauty of these ideas, I’m not sure

that they must be at the foundation of our theory of inference.