Archive for folded Markov chain

Takaisin helsinkiin

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , on July 23, 2017 by xi'an

I am off tomorrow morning to Helsinki for the European Meeting of Statisticians (EMS 2017). Where I will talk on how to handle multiple estimators in Monte Carlo settings (although I have not made enough progress in this direction to include anything truly novel in the talk!) Here are the slides:

I look forward this meeting, as I remember quite fondly the previous one I attended in Budapest. Which was of the highest quality in terms of talks and interactions. (I also remember working hard with Randal Douc on a yet-unfinished project!)

warp-U bridge sampling

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on October 12, 2016 by xi'an

[I wrote this set of comments right after MCqMC 2016 on a preliminary version of the paper so mileage may vary in terms of the adequation to the current version!]

In warp-U bridge sampling, newly arXived and first presented at MCqMC 16, Xiao-Li Meng continues (in collaboration with Lahzi Wang) his exploration of bridge sampling techniques towards improving the estimation of normalising constants and ratios thereof. The bridge sampling estimator of Meng and Wong (1996) is an harmonic mean importance sampler that requires iterations as it depends on the ratio of interest. Given that the normalising constant of a density does not depend on the chosen parameterisation in the sense that the Jacobian transform preserves this constant, a degree of freedom is in the choice of the parameterisation. This is the idea behind warp transformations. The initial version of Meng and Schilling (2002) used location-scale transforms, while the warp-U solution goes for a multiple location-scale transform that can be seen as based on a location-scale mixture representation of the target. With K components. This approach can also be seen as a sort of artificial reversible jump algorithm when one model is fully known. A strategy Nicolas and I also proposed in our nested sampling Biometrika paper.

Once such a mixture approximation is obtained. each and every component of the mixture can be turned into the standard version of the location-scale family by the appropriate location-scale transform. Since the component index k is unknown for a given X, they call this transform a random transform, which I find somewhat more confusing that helpful. The conditional distribution of the index given the observable x is well-known for mixtures and it is used here to weight the component-wise location-scale transforms of the original distribution p into something that looks rather similar to the standard version of the location-scale family. If no mode has been forgotten by the mixture. The simulations from the original p are then rescaled by one of those transforms, which index k is picked according to the conditional distribution. As explained later to me by XL, the random[ness] in the picture is due to the inclusion of a random ± sign. Still, in the notation introduced in (13), I do not get how the distribution Þ [sorry for using different symbols, I cannot render a tilde on a p] is defined since both ψ and W are random. Is it the marginal? In which case it would read as a weighted average of rescaled versions of p. I have the same problem with Theorem 1 in that I do not understand how one equates Þ with the joint distribution.

Equation (21) is much more illuminating (I find) than the previous explanation in that it exposes the fact that the principle is one of aiming at a new distribution for both the target and the importance function, with hopes that the fit will get better. It could have been better to avoid the notion of random transform, then, but this is mostly a matter of conveying the notion.

On more specifics points (or minutiae), the unboundedness of the likelihood is rarely if ever a problem when using EM. An alternative to the multiple start EM proposal would then be to get sequential and estimate the mixture in a sequential manner, only adding a component when it seems worth it. See eg Chopin and Pelgrin (2004) and Chopin (2007). This could also help with the bias mentioned therein since only a (tiny?) fraction of the data would be used. And the number of components K has an impact on the accuracy of the approximation, as in not missing a mode, and on the computing time. However my suggestion would be to avoid estimating K as this must be immensely costly.

Section 6 obviously relates to my folded Markov interests. If I understand correctly, the paper argues that the transformed density Þ does not need to be computed when considering the folding-move-unfolding step as a single step rather than three steps. I fear the description between equations (30) and (31) is missing the move step over the transformed space. Also on a personal basis I still do not see how to add this approach to our folding methodology, even though the different transforms act as as many replicas of the original Markov chain.

MCqMC [#3]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on August 20, 2016 by xi'an

On Thursday, Christoph Aistleiter [from TU Gräz] gave a plenary talk at MCqMC 2016 around Hermann Weyl’s 1916 paper, Über die Gleichverteilung von Zahlen mod. Eins, which demonstrates that the sequence a, 22a, 32a, … mod 1 is uniformly distributed on the unit interval when a is irrational. Obviously, the notion was not introduced for simulation purposes, but the construction applies in this setting! At least in a theoretical sense. Since for instance the result that the sequence (a,a²,a³,…) mod 1 being uniformly distributed for almost all a’s has not yet found one realisation a. But a nice hour of history of mathematics and number theory: it is not that common we hear the Riemann zeta function mentioned in a simulation conference!

The following session was a nightmare in that I wanted to attend all four at once! I eventually chose the transport session, in particular because Xiao-Li advertised it at the end of my talk. The connection is that his warp bridge sampling technique provides a folding map between modes of a target. Using a mixture representation of the target and folding all components to a single distribution. Interestingly, this transformation does not require a partition and preserves the normalising constants [which has a side appeal for bridge sampling of course]. In a problem with an unknown number of modes, the technique could be completed by [our] folding in order to bring the unobserved modes into the support of the folded target. Looking forward the incoming paper! The last talk of this session was by Matti Vihola, connecting multi-level Monte Carlo and unbiased estimation à la Rhee and Glynn, paper that I missed when it got arXived last December.

The last session of the day was about probabilistic numerics. I have already discussed extensively about this approach to numerical integration, to the point of being invited to the NIPS workshop as a skeptic! But this was an interesting session, both with introductory aspects and with new ones from my viewpoint, especially Chris Oates’ description of a PN method for handling both integrand and integrating measure as being uncertain. Another arXival that went under my decidedly deficient radar.

MCqMC 2016 [#2]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on August 17, 2016 by xi'an

In her plenary talk this morning, Christine Lemieux discussed connections between quasi-Monte Carlo and copulas, covering a question I have been considering for a while. Namely, when provided with a (multivariate) joint cdf F, is there a generic way to invert a vector of uniforms [or quasi-uniforms] into a simulation from F? For Archimedian copulas (as we always can get back to copulas), there is a resolution by the Marshall-Olkin representation,  but this puts a restriction on the distributions F that can be considered. The session on synthetic likelihoods [as introduced by Simon Wood in 2010] put together by Scott Sisson was completely focussed on using normal approximations for the distribution of the vector of summary statistics, rather than the standard ABC non-parametric approximation. While there is a clear (?) advantage in using a normal pseudo-likelihood, since it stabilises with much less simulations than a non-parametric version, I find it difficult to compare both approaches, as they lead to different posterior distributions. In particular, I wonder at the impact of the dimension of the summary statistics on the approximation, in the sense that it is less and less likely that the joint is normal as this dimension increases. Whether this is damaging for the resulting inference is another issue, possibly handled by a supplementary ABC step that would take the first-step estimate as summary statistic. (As a side remark, I am intrigued at everyone being so concerned with unbiasedness of methods that are approximations with no assessment of the amount of approximation!) The last session of the day was about multimodality and MCMC solutions, with talks by Hyungsuk Tak, Pierre Jacob and Babak Shababa, plus mine. Hunsuk presented the RAM algorithm I discussed earlier under the title of “love-hate” algorithm, which was a kind reference to my post! (I remain puzzled by the ability of the algorithm to jump to another mode, given that the intermediary step aims at a low or even zero probability region with an infinite mass target.) And Pierre talked about using SMC for Wang-Landau algorithms, with a twist to the classical stochastic optimisation schedule that preserves convergence. And a terrific illustration on a distribution inspired from the Golden Gate Bridge that reminded me of my recent crossing! The discussion around my folded Markov chain talk focussed on the extension of the partition to more than two sets, the difficulty being in generating automated projections, with comments about connections with computer graphic tools. (Too bad that the parallel session saw talks by Mark Huber and Rémi Bardenet that I missed! Enjoying a terrific Burmese dinner with Rémi, Pierre and other friends also meant I could not post this entry on time for the customary 00:16. Not that it matters in the least…)

MCqMC 2016 [#1]

Posted in Statistics, Travel, University life with tags , , , , , , , , on August 16, 2016 by xi'an

mcqmc1This week, I attend the MCqMC 2016 conference in Stanford, which is quite an exciting gathering of researchers involved in various aspects of Monte Carlo methods. As Art Owen put it in his welcoming talk, the whole Carlo family is there! (Not to mention how pleasant the Stanford Campus currently is, after the scorching heat we met the past week in Northern California inlands.) My talk is on folded Markov chains, which is a proposal Randal and I have been working on for quite a while, with Gareth joining us more recently. The basic idea was inspired from a discussion I had about a blog post, so long ago that I cannot even trace it! Namely, when defining an inside set A and an outside set, such that the outside set can be projected onto the inside set, one can fold both the target and the proposal, essentially looking at a collection of values for each step of the Markov chain. In other words, the problem can be reduced to A at essentially no cost and with the benefits of a compact support A and of a possibly uniformly ergodic Markov chain. We are still working on the paper, but the idea is both cool and straightforward, so we decided to talk about it at Nordstat 2016 and now MCqMC 2016.