## 21w5107 [day 2]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on December 1, 2021 by xi'an

After a rich and local (if freezing) dinner on a rooftop facing the baroque Oaxaca cathedral, and an early invigorating outdoor swim in my case!, the morning session was mostly on mixtures, with Helen Ogden exploring X validation for (estimating the number k of components for) finite mixtures, when using the likelihood as an objective function. I was unclear of the goal however when considering that the data supporting the study was Uniform (0,1), nothing like a mixture of Normal distributions. And about the consistency attached to the objective function. The session ended with Diana Cai presenting a counter-argument in the sense that she proved, along with Trevor Campbell and Tamara Broderick, that the posterior on k diverges to infinity with the number n of observations if a mixture model is misspecified for said data. Which does not come as a major surprise since there is no properly defined value of k when the data is not generated from the adopted mixture. I would love to see an extension to the case when the k component mixture contains a non-parametric component! In-between, Alexander Ly discussed Bayes factors for multiple datasets, with some asymptotics showing consistency for some (improper!) priors if one sample size grows to infinity. With actually attaining the same rate under both hypotheses. Luis Nieto-Barajas presented an approach on uncertainty assessment through KL divergence for random probability measures, which requires a calibration of the KL in this setting, as KL does not enjoy a uniform scale, and a prior on a Pólya tree. And Chris Holmes presented a recent work with Edwin Fong and Steven Walker on a prediction approach to Bayesian inference. Which I had had on my reading list for a while. It is a very original proposal where likelihoods and priors are replaced by the sequence of posterior predictives and only parameters of interest get simulated. The Bayesian flavour of the approach is delicate to assess though, albeit a form of non-parametric Bayesian perspective… (I still need to read the paper carefully.)

In the afternoon session, Judith Rousseau presented her recent foray in cut posteriors for semi-parametric HMMs. With interesting outcomes for efficiently estimating the transition matrix, the component distributions, and the smoothing distribution. I wonder at the connection with safe Bayes in that cut posteriors induce a loss of information. Sinead Williamson spoke on distributed MCMC for BNP. Going back at the “theme of the day”, namely clustering and finding the correct (?) number of clusters. With a collapsed versus uncollapsed division that reminded me of the marginal vs. conditional María Gil-Leyva discussed yesterday. Plus a decomposition of a random measure into a finite mixture and an infinite one that also reminded me of the morning talk of Diana Cai. (And making me wonder at the choice of the number K of terms in the finite part.) Michele Guindani spoke about clustering distributions (with firecrackers as a background!). Using the nDP mixture model, which was show to suffer from degeneracy (as discussed by Frederico Camerlenghi et al. in BA). The subtle difference stands in using the same (common) atoms in all random distributions at the top of the hierarchy, with independent weights. Making the partitions partially exchangeable. The approach relies on Sylvia’s generalised mixtures of finite mixtures. With interesting applications to microbiome and calcium imaging (including a mice brain in action!). And Giovanni Rebaudo presented a generalised notion of clustering aligned on a graph, with some observations located between the nodes corresponding to clusters. Represented as a random measure with common parameters for the clusters and separated parameters outside. Interestingly playing on random partitions, Pólya urns, and species sampling.

## ordered allocation sampler

Posted in Books, Statistics with tags , , , , , , , , , , , on November 29, 2021 by xi'an

Recently, Pierpaolo De Blasi and María Gil-Leyva arXived a proposal for a novel Gibbs sampler for mixture models. In both finite and infinite mixture models. In connection with Pitman (1996) theory of species sampling and with interesting features in terms of removing the vexing label switching features.

The key idea is to work with the mixture components in the random order of appearance in an exchangeable sequence from the mixing distribution (…) In accordance with the order of appearance, we derive a new Gibbs sampling algorithm that we name the ordered allocation sampler. “

This central idea is thus a reinterpretation of the mixture model as the marginal of the component model when its parameter is distributed as a species sampling variate. An ensuing marginal algorithm is to integrate out the weights and the allocation variables to only consider the non-empty component parameters and the partition function, which are label invariant. Which reminded me of the proposal we made in our 2000 JASA paper with Gilles Celeux and Merrilee Hurn (one of my favourite papers!). And of the [first paper in Statistical Methodology] 2004 partitioned importance sampling version with George Casella and Marty Wells. As in the later, the solution seems to require the prior on the component parameters to be conjugate (as I do not see a way to produce an unbiased estimator of the partition allocation probabilities).

The ordered allocation sample considers the posterior distribution of the different object made of the parameters and of the sequence of allocations to the components for the sample written in a given order, ie y¹,y², &tc. Hence y¹ always gets associated with component 1, y² with either component 1 or component 2, and so on. For this distribution, the full conditionals are available, incl. the full posterior on the number m of components, only depending on the data through the partition sizes and the number m⁺ of non-empty components. (Which relates to the debate as to whether or not m is estimable…) This sequential allocation reminded me as well of an earlier 2007 JRSS paper by Nicolas Chopin. Albeit using particles rather than Gibbs and applied to a hidden Markov model. Funny enough, their synthetic dataset univ4 almost resembles the Galaxy dataset (as in the above picture of mine)!

## a journal of the plague year² [not there yet]

Posted in Books, Kids, pictures, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , on November 27, 2021 by xi'an

Returned to Warwick once more, with “traffic-as-usual” at Charles de Gaulle airport, including a single border officer for the entire terminal, a short-timed fright that I actually needed a PCR test on top of my vaccine certificate to embark, due to wrong signage, a one-hour delay at departure due to foggy conditions in B’ham, and another ½ hour delay at arrival due to a shortage of staff and hence no exit stairs available! And got a tense return to B’ham as the taxi line in Warwick had vanished!

Read the first novel of P. Djèlí-Clark A Master of Djinn after reading a series of short stories and novellas of his, taking place in the same fantastic Cairo of the early 1900’s. This was enjoyable, mostly, again thanks to well-constructed characters (apart from the arch-villain) and the appeal of the magical Cairo imagined by the author. I did not feel the appearances of Raymond Poincaré or von Birsmark were really needed, though. Also kindled A history of what comes next, by Sylvain Neuvel, which I got as a free (Tor) book. Which is an interesting take on the space race, with a pair of (super-)women behind the entire thing. And a lot of connections to the actual history. I somehow got tired in the middle, even though I finished the book during my commuting to and from Warwick.

Watched within a week My Name, a dark Korean TV drama,  as I found it very good and rather original (albeit with some similarities with the excellent Jeju-based Night in Paradise). The storyline is one of a young woman, Ji Woo, seeking revenge on her father’s killer, by joining the criminal gang her father was part of and infiltrating the police (not really  a spoiler!). At the beginning, this sounded like gang glorification, hence rather unappealing, but soon things proved to be quite different from how they appeared first. The scenario is of course most unrealistic, especially the (brutal and gory) fights where the heroine takes down endless rows of gang members and where the participants almost always recover from knife injuries that should have been fatal or at least permanently damaging. And the ineffectiveness of the police in stopping the drug dealers. However, when watched as a theatrical performance, the main characters in My Name, most especially Ji Woo, are well-constructed and ambiguous enough to make this descent into darkness worth watching. (Given the conclusion of the series, I cannot imagine a second season being made.) Also had a short go at Night Teeth, which proved a complete waste of time!

## dice and sticks

Posted in Books, Kids, R with tags , , , , , , on November 19, 2021 by xi'an

A quick weekend riddle from the Riddler about the probability of getting a sequence of increasing numbers from dice with an increasing number of faces, eg 4-, 6-, and 8-face dice. Which happens to be 1/4. By sheer calculation (à la Gauss) or simple enumération (à la R):

> for(i in 1:4)for(j in (i+1):6)F=F+(8-j)
> F/4/6/8
[1] 0.25


The less-express riddle is an optimisation problem related with stick breaking: given a stick of length one, propose a fraction a and win (1-a) if a Uniform x is less than one. Since the gain is a(1-a) the maximal average gain is associated with a=½. Now, if the remaining stick (1-a) can be divided when x>a, what is the sequence of fractions one should use when the gain is the length of the remaining stick? With two attempts only, the optimal gain is still ¼. And a simulation experiment with three attempts again returns ¼.

## finding our way in the dark

Posted in Books, pictures, Statistics with tags , , , , , , , , , on November 18, 2021 by xi'an

The paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate

$h(\theta)=\mathbb{P}_\theta(d(S,S^\text{obs})\le\epsilon)$

since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?

In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)

As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!