Archive for invariant measure

Kick-Kac teleportation

Posted in Books, pictures, Statistics with tags , , , , , , , , on January 23, 2022 by xi'an

Randal Douc, Alain Durmus, Aurélien Enfroy, and Jimmy Olson have arXived their Kick-Kac teleportation paper, which was presented by Randal at CIRM last semester. It is based on Kac’s theorem, which states that, for a Markov chain with invariant distribution π, under (π) stationarity, the average tour between two visits to an accessible set C is also stationary. Which can be used for approximating π(h) if π(C) is known (or well-estimated). Jim Hobert and I exploited this theorem in our 2004 perfect sampling paper. The current paper contains a novel proof of the theorem under weaker conditions. (Note that the only condition on C is that it is accessible, rather than a small set. Which becomes necessary for geometric ergodicity, see condition (A4).)

What they define as the Kick-Kac teleportation (KKT) process is the collection of trajectories between two visits to C. Their memoryless version requires perfect simulations from π restricted to the set C. With a natural extension based on a Markov kernel keeping π restricted to the set C stationary. And a further generalisation allowing for lighter tails that also contains the 2005 paper by Brockwell and Kadane as a special case.

The ability of generating from a different kernel Q at each visit to C allows for different dynamics (as in other composite kernels). In their illustrations, the authors use lowest density regions for C, which is rather surprising to me. Except that it allows for a better connection between modes of the target π: the higher performances of the KKT algorithms against the considered alternatives are apparently dependent on the ability of the kernel Q to explore other modes with sufficient frequency.

optimal choice among MCMC kernels

Posted in Statistics with tags , , , , , , , , , , on March 14, 2019 by xi'an

Last week in Siem Reap, Florian Maire [who I discovered originates from a Norman town less than 10km from my hometown!] presented an arXived joint work with Pierre Vandekerkhove at the Data Science & Finance conference in Cambodia that considers the following problem: Given a large collection of MCMC kernels, how to pick the best one and how to define what best means. Going by mixtures is a default exploration of the collection, as shown in (Tierney) 1994 for instance since this improves on both kernels (esp. when each kernel is not irreducible on its own!). This paper considers a move to local weights in the mixture, weights that are not estimated from earlier simulations, contrary to what I first understood.

As made clearer in the paper the focus is on filamentary distributions that are concentrated nearby lower-dimension sets or manifolds Since then the components of the kernel collections can be restricted to directions of these manifolds… Including an interesting case of a 2-D highly peaked target where converging means mostly simulating in x¹ and covering the target means mostly simulating in x². Exhibiting a schizophrenic tension between the two goals. Weight locally dependent means correction by Metropolis step, with cost O(n). What of Rao-Blackwellisation of these mixture weights, from weight x transition to full mixture, as in our PMC paper? Unclear to me as well [during the talk] is the use in the mixture of basic Metropolis kernels, which are not absolutely continuous, because of the Dirac mass component. But this is clarified by Section 5 in the paper. A surprising result from the paper (Corollary 1) is that the use of local weights ω(i,x) that depend on the current value of the chain does jeopardize the stationary measure π(.) of the mixture chain. Which may be due to the fact that all components of the mixture are already π-invariant. Or that the index of the kernel constitutes an auxiliary (if ancillary)  variate. (Algorithm 1 in the paper reminds me of delayed acceptance. Making me wonder if computing time should be accounted for.) A final question I briefly discussed with Florian is the extension to weights that are automatically constructed from the simulations and the target.

revisiting marginalisation paradoxes [Bayesian reads #1]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2019 by xi'an

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this  ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)

%d bloggers like this: