First of all we would like to thank you for the attention you paid to our paper and for all your fruitful remarks. This will be taken into consideration to write the next version of the article. However, we have several comments to clarify some details.

First, you say that because of the thresholding operator, the Shrinkage-Thresholding MALA algorithm does not sample exactly from the target distribution. This is true for the hard thresholding operator (Section 3.2) which avoids the shrinkage of all the active rows (but which cannot draw rows with a norm lower than a given threshold). The two other operators, namely the L2,1 proximal and the soft thresholding function, do not suffer from this flaw and propose new rows with norms as close to zero as needed. This is illustrated in Figure 1. Therefore, in the numerical section, both RJMCMC and STMALA target the right distribution.

In the numerical section, Figure 5 displays the error obtained when the algorithms are used to estimate the activation probabilities (defined in equation (18)). In addition, the ordinate axis of this figure (as well as the one of Figure 10 for example) is in logarithmic scale which explains why there is no plateau in the curves while the algorithms do converge.

Once again, many thanks for your remarks which will help us to write an improved version of the article.

Best regards,

Amandine Schreck and her co-authors.

]]>* The methodology developed in the paper is not limited to large data sets but is applicable to any scenario where the Metropolis-Hastings ratio is intractable as long as one has exact confidence intervals for a Monte Carlo estimate of this ratio; e.g. this could be applied to doubly intractable targets where using standard pseudo-marginal techniques is often impossible as obtaining non-negative unbiased estimates requires perfect samples. It is also worth mentioning that the idea of using confidence intervals for deciding whether to accept/reject a proposal within Metropolis in such intractable scenarios is not original and has been proposed several times since the late 80′s in operation research (see the references in our paper). Unfortunately the approximate confidence intervals used in all previous work we are aware of can provide misleading results as demonstrated empirically in our paper. Our adaptive sampling strategy combined to exact confidence intervals is more conservative and more robust and it allows us to provide a quantitative bound between the target of interest and the perturbed target we are sampling from.

* The large data sets scenario is not a particularly compelling application of the methodology as it is clearly acknowledged in the paper. The problem is that, in a large data set context, the estimate of the MH ratio is obtained using a “blind” Monte Carlo strategy and typically admits a large variance (as we might miss very informative observations). We could obviously use an importance sampling type strategy to reduce this variance (and consequently improve the performance of the algorithm) but this does not appear very realistic from a practical point of view in the large data set context.

]]>On the first point, you’re right. It’s really hard to do in general!

The disease mapping example (the BYM model) we have in the paper is nice as it has a (relatively) local component (made up of the sum of the structured and unstructured effects) that is controlled by two parameters. And by thinking about what the model component is supposed to do, we can see that there is a better parameterisation in terms of a variance parameter (turning the tap on/off) and a mixing parameter that controls how much structure is in the component (changing the balance of hot and cold water). This has a sensible implicit order. Models can be written in order of complexity as:

nothing -> iid -> structured.

And because of this hierarchy of “base” models and the natural, interpretable ordering, we can set good priors for this local component without needing to know the whole global model. (In the Germany example, there is a separate spline model on the covariate, so this isn’t a one component graphical model.

—–

“Each shell of the Russian doll corresponds to a further level of complexity whose order need be decided by the modellerâ€¦ Not very realistic in a hierarchical model with several types of parameters having only local meaning.”

—–

But we also needed to know *a lot* about what the model component does. I would argue that that’s neither a feature nor a bug. It’s a reality. Parameterisation in a way that facilitates sensible priors is a modelling issue and needs to be handled by the modeller. So I’m not sure I agree with the above quote. But if it’s possible to do for “general” (or some, or a few) components what we did for the BYM model, I think local specifications are still possible. In the end, what I hope this paper does is give the modeller some idea of what to look for when parameterising models and how to then set some useful priors.

Beyond this, finding approximately orthogonal parameterisations and setting independent priors is the best we’ve done so far…

(There’s almost certainly some link here with the work Cox and Reid did arguing for orthogonal parameterisations as a way to facilitate parameter estimation)

There really is *so much* exiting work to be done here (can you tell I’m still excited by this paper!), but I think some things are common to any method for setting priors on multiple parameters. And there are definitely questions about how to really stack these together across complex (object oriented) graphical models. In this set up, there’s also interesting questions on how strongly to penalise components of models that are over-specified. (We talked a little about mixture models…)

As for testing, well, it hasn’t come up yet for me and I’m not planning on going hunting for problems. It’s one of those things I just don’t find very interesting (like, say, football) But you never know…. (and there are lots of really smart people who are interested in it, so I’m pretty sure the area will survive without my amateur musings ;p )

]]>