precision in MCMC
While browsing Images des Mathématiques, I came across this article [in French] that studies the impact of round-off errors on number representations in a dynamical system and checked how much this was the case for MCMC algorithms like the slice sampler (recycling some R code from Monte Carlo Statistical Methods). By simply adding a few signif(…,dig=n) in the original R code. And letting the precision n vary.
“…si on simule des trajectoires pendant des intervalles de temps très longs, trop longs par rapport à la précision numérique choisie, alors bien souvent, les résultats des simulations seront complètement différents de ce qui se passe en réalité…”
Rather unsurprisingly (!), using a small enough precision (like two digits on the first row) has a visible impact on the simulation of a truncated normal. Moving to three digits seems to be sufficient in this example… One thing this tiny experiment reminds me of is the lumpability property of Kemeny and Snell. A restriction on Markov chains for aggregated (or discretised) versions to be ergodic or even Markov. Also, in 2000, Laird Breyer, Gareth Roberts and Jeff Rosenthal wrote a Statistics and Probability Letters paper on the impact of round-off errors on geometric ergodicity. However, I presume [maybe foolishly!] that the result stated in the original paper, namely that there exists an infinite number of precision digits for which the dynamical system degenerates into a small region of the space does not hold for MCMC. Maybe foolishly so because the above statement means that running a dynamical system for “too” long given the chosen precision kills the intended stationary properties of the system. Which I interpret as getting non-ergodic behaviour when exceeding the period of the uniform generator. More or less.
January 14, 2016 at 7:28 pm
This is one of those things that I probably failed to get across the other week at that Big Models meeting. A lot of the times (such a GP regression with squared exponential covariance functions), you’ll really struggle to get even two correct decimal places for the intermediate calculations. This, to me, kills any idea that MCMC (or any other computation method) will target the correct posterior. It may not even be close.
I assume that there are a core of people in the ML and BNP communities having conversations about these sorts of things (given how unavoidable they are when combining Gaussian processes with big data). To some extent this paper
Click to access 1501.06195v1.pdf
will solve the problem, but it’s focussing more on the question of “how much information do we need to solve the problem” rather than “how big can a problem in this class be and still be solved on a computer?”, which is just a critical.
January 14, 2016 at 8:55 pm
Which is probably why one can solve almost anything with linear regression: the model error of the approximate error, is smaller than the numerical error of the correct model.
In a little bit more serious note, I was always wondering whether single precision floating point may not be as good or as bad as double precision
January 14, 2016 at 11:22 pm
We can solve anything except linear regression problems!
January 15, 2016 at 8:09 am
An even better theorem: we cannot solve anything! Leaving us plenty of time for Voltaire’s “cultiver notre jardin”..!