Chris Sherlock, Andrew Golightly and Daniel Henderson recently arXived a paper on a new kind of delayed acceptance.
“With simplicity in mind, we focus on a k-nearest neighbour regression model as the cheap surrogate.”
The central notion in the paper is to extrapolate from values of the likelihoods at a few points in the parameter space towards the whole space through a k-nearest neighbour estimate. While this solution is simple and relatively cheap to compute, it is unclear it is a good surrogate because it does not account for the structure of the model while depending on the choice of a distance. Recent works on Gaussian process approximations seem more relevant. See e.g. papers by Ed Meeds and Max Welling, or by Richard Wilkinson for ABC versions. Obviously, because this is a surrogate only for the first stage delayed acceptance (while the second stage is using the exact likelihood, as in our proposal), the approximation does not have to be super-tight. It should also favour the exploration of tails since (a) any proposal θ outside the current support of the chain is allocated a surrogate value that is the average of its k neighbours, hence larger than the true value in the tails, and (b) due to the delay a larger scale can be used in the random walk proposal. As the authors acknowledge, the knn method deteriorates quickly with the dimension. And computing the approximation grows with the number of MCMC iterations, given that the algorithm is adaptive and uses the exact likelihood values computed so far. Only for the first stage approximation, though, which explains “why” the delayed acceptance algorithm converges. I wondered for a short while whether this was enough to justify convergence, given that the original Metropolis-Hastings probability is just broken into two parts. Since the second stage compensates for the use of a surrogate on the first step, it should not matter in the end. However, the rejection of a proposal still depends on this approximation, i.e., differs from the original algorithm, and hence is turning the Markov chain into a non-Markovian process.
“The analysis sheds light on how computationally cheap the deterministic approximation needs to be to make its use worthwhile and on the relative importance of it matching the `location’ and curvature of the target.”
I had missed the “other” paper by some of the authors on the scaling of delayed acceptance, where they “assume that the error in the cheap deterministic approximation is a realisation of a random function” (p.3). In which they provide an optimal scaling result for high dimensions à la Roberts et al. (1997), namely a scale of 2.38 (times the target scale) in the random walk proposal. The paper however does not describe the cheap approximation to the target or pseudo-marginal version.
A large chunk of the paper is dedicated to the construction and improvement of the KD-tree used to find the k nearest neighbours. In O(d log(n)) time. Algorithm on which I have no specific comment. Except maybe that the construction of a KD-tree in accordance with a Mahalanobis distance discussed in Section 2.1 requires that the MCMC algorithm has properly converged, which is unrealistic. And also that the construction of a balanced tree seems to require heavy calibrations.
The paper is somewhat harder to read than need be (?) because the authors cumulate the idea of delayed acceptance based on this knn approximation with the technique of pseudo-marginal Metropolis-Hastings. While there is an added value in doing so it complexifies the exposition. And leads to ungainly acronyms like adaptive “da-PsMMH”, which simply are un-readable (!).
I would suggest some material to be published as supplementary material and the overall length of the paper to be reduced. For instance, Section 4.2 is not particularly conclusive. See, e.g., Theorem 2. Or the description of the simulated models in Section 5, which is sometimes redundant.