At BNP13, Brian Trippe presented the AISTAT 2022 paper he recently wrote with Tin D. Nguyen and Tamara Broderick. Which made me read their 2021 paper on the topic. There, they note that coupling may prove challenging, which they blame on label switching. Considering a naïve Gibbs sampler on the space of partitions, meaning allocating each data-point to one of the existing partitions or to a singleton, they construct an optimal transport coupling under Hamming distance. Which appears to be achievable in O(NK³log{K}), if K is the maximal number of partitions among both chains. The paper does not goes deeply into the implementation, which involves [to quote] (a) computing the distances between each pair of partitions in the Cartesian product of supports of the Gibbs conditionals and (b) solving the optimal transport problem. Except in the appendix where the book-keeping necessary to achieve O(K²) for pairwise distances and the remaining complexity follows from the standard Orlin’s algorithm. What remains unclear from the paper is that, while the chains couple faster (fastest?), the resulting estimators do not necessarily improve upon budget-equivalent alternatives. (The reason for the failure of the single chain in Figure 2 is hard to fathom.)
Archive for Hamming distance
coupling for the Gibbs sampler
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags AISTATS 2022, BNP13, Hamming distance, label switching, Lago Llanquihue, maximal coupling, optimal transport, Orlin's algorithm, Orsono volcano, partition, single chain on November 27, 2022 by xi'anHamming Ball Sampler
Posted in Books, Statistics, University life with tags auxiliary variable, error correcting codes, Hamming distance, intractable likelihood, MCMC, simulation on May 7, 2015 by xi'anMichalis Titsias and Christopher Yau just arXived a paper entitled the Hamming Ball sampler. Aimed at large and complex discrete latent variable models. The completion method is called after Richard Hamming, who is associated with code correcting methods (reminding me of one of the Master courses I took on coding, 30 years ago…), because it uses the Hamming distance in a discrete version of the slice sampler. One of the reasons for this proposal is that conditioning upon the auxiliary slice variable allows for the derivation of normalisation constants otherwise unavailable. The method still needs some calibration in the choice of blocks that partition the auxiliary variable and in the size of the ball. One of the examples assessed in the paper is a variable selection problem with 1200 covariates, out of which only 2 are relevant, while another example deals with a factorial HMM, involving 10 hidden chains. Since the paper compares each example with the corresponding block Gibbs sampling solution, it means this Gibbs sampling version is not intractable. It would be interesting to see a case where the alternative is not available…