Archive for Dirichlet mixture priors

Bill’s 80th!!!

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on April 17, 2022 by xi'an

“It was the best of times,
it was the worst of times”
[Dickens’ Tale of Two Cities (which plays a role in my friendship with Bill!)]

My flight to NYC last week was uneventful and rather fast and I worked rather well, even though the seat in front of me was inclined to the max for the entire flight! (Still got glimpses of Aline and of Deepwater Horizon from my neighbours.) Taking a very early flight from Paris was great making a full day once in NYC,  but “forcing” me to take a taxi, which almost ended up in disaster since the Über driver did not show up. At all. And never replied to my message. Fortunately trains were running, I was also running despite the broken rib, and I arrived at the airport some time before access was closed, grateful for the low activity that day. I also had another bit of a worrying moment at the US border control in JFK as I ended up in a back-office of the Border Police after the machine could not catch my fingerprints. And another stop at the luggage control as my lack of luggage sounded suspicious!The conference was delightful in celebrating Bill’s carreer and kindness (tinted with the most gentle irony!). Among stories told at the banquet, I was surprised to learn of Bill’s jazz career side, as I had never heard him play the piano or the clarinet! Even though we had chatted about music and literature on many occasions. Since our meeting in 1989… The (scientific side of the) conference included many talks around shrinkage, from loss estimation to predictive estimation, reminding me of the roaring 70’s and 80’s [James-Stein wise]. And demonstrating the impact of Bill’s wor throughout this era (incl. on my own PhD thesis). I started wondering at the (Bayesian) use of the loss estimate, though, as I set myself facing two point estimators attached with two estimators of their loss: it did not seem a particularly good idea to systematically pick the one with the smallest estimate (and Jim Berger confirmed this feeling on a later discussion). Among the talks on less familiar topics (of mine), I discovered work of Genevera Allen‘s on inferring massive network for neuron connections under sparse information. And of Emma Jingfei Zhang, equally centred on network inference, with applications to brain connectivity.

In a somewhat remote connection with Bill’s work (and our joint and hilarious assessment of Pitman closeness), I presented part of our joint and current work with Adrien Hairault and Judith Rousseau on inferring the number of components in a mixture by Bayes factors when the alternative is an infinite mixture (i.e., a Dirichlet process mixture). Of which Ruobin Gong gave a terrific discussion. (With a connection to her current work on Sense and Sensitivity.)

I was most sorry to miss Larry Wasserman’s and Rob Strawderman’s talk to rush back to the airport, the more because I am sure Larry’s talk would have brought a new light on causality (possibly equating it with tequila and mixtures!). The flight back was uneventfull, the plane rather empty and I slept most of the time. Overall,  it was most wonderful to re-connect with so many friends. Most of whom I had not seen for ages, even before the pandemic. And to meet new friends. (Nothing original in the reported feeling, just telling that the break in conferences and workshops was primarily a hatchet job on social relations and friendships.)

transport Monte Carlo

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , on August 31, 2020 by xi'an

Read this recent arXival by Leo Duan (from UF in Gainesville) on transport approaches to approximate Bayesian computation, in connection with normalising flows. The author points out a “lack of flexibility in a large class of normalizing flows”  to bring forward his own proposal.

“…we assume the reference (a multivariate uniform distribution) can be written as a mixture of many one-to-one transforms from the posterior”

The transportation problem is turned into defining a joint distribution on (β,θ) such that θ is marginally distributed from the posterior and β is one of an infinite collection of transforms of θ. Which sounds quite different from normalizing flows, to be sure. Reverting the order, if one manages to simulate β from its marginal the resulting θ is one of the transforms. Chosen to be a location-scale modification of β, s⊗β+m. The weights when going from θ to β are logistic transforms with Dirichlet distributed scales. All with parameters to be optimised by minimising the Kullback-Leibler distance between the reference measure on β and its inverse mixture approximation, and resorting to gradient descent. (This may sound a wee bit overwhelming as an approximation strategy and I actually had to make a large cup of strong macha to get over it, but this may be due to the heat wave occurring at the same time!) Drawing θ from this approximation is custom-made straightforward and an MCMC correction can even be added, resulting in an independent Metropolis-Hastings version since the acceptance ratio remains computable. Although this may defeat the whole purpose of the exercise by stalling the chain if the approximation is poor (hence suggesting this last step being used instead as a control.)

The paper also contains a theoretical section that studies the approximation error, going to zero as the number of terms in the mixture, K, goes to infinity. Including a Monte Carlo error in log(n)/n (and incidentally quoting a result from my former HoD at Paris 6, Paul Deheuvels). Numerical experiments show domination or equivalence with some other solutions, e.g. being much faster than HMC, the remaining $1000 question being of course the on-line evaluation of the quality of the approximation.

latent nested nonparametric priors

Posted in Books, Statistics with tags , , , , , , , on September 23, 2019 by xi'an

A paper on an extended type of non-parametric priors by Camerlenghi et al. [all good friends!] is about to appear in Bayesian Analysis, with a discussion open for contributions (until October 15). While a fairly theoretical piece of work, it validates a Bayesian approach for non-parametric clustering of separate populations with, broadly speaking, common clusters. More formally, it constructs a new family of models that allows for a partial or complete equality between two probability measures, but does not force full identity when the associated samples do share some common observations. Indeed, the more traditional structures prohibit one or the other, from the Dirichlet process (DP) prohibiting two probability measure realisations from being equal or partly equal to some hierarchical DP (HDP) already allowing for common atoms across measure realisations, but prohibiting complete identity between two realised distributions, to nested DP offering one extra level of randomness, but with an infinity of DP realisations that prohibits common atomic support besides completely identical support (and hence distribution).

The current paper imagines two realisations of random measures written as a sum of a common random measure and of one of two separate almost independent random measures: (14) is the core formula of the paper that allows for partial or total equality. An extension to a setting larger than facing two samples seems complicated if only because of the number of common measures one has to introduce, from the totally common measure to measures that are only shared by a subset of the samples. Except in the simplified framework when a single and universally common measure is adopted (with enough justification). The randomness of the model is handled via different completely random measures that involved something like four degrees of hierarchy in the Bayesian model.

Since the example is somewhat central to the paper, the case of one or rather two two-component Normal mixtures with a common component (but with different mixture weights) is handled by the approach, although it seems that it was already covered by HDP. Having exactly the same term (i.e., with the very same weight) is not, but this may be less interesting in real life applications. Note that alternative & easily constructed & parametric constructs are already available in this specific case, involving a limited prior input and a lighter computational burden, although the  Gibbs sampler behind the model proves extremely simple on the paper. (One may wonder at the robustness of the sampler once the case of identical distributions is visited.)

Due to the combinatoric explosion associated with a higher number of observed samples, despite obvious practical situations,  one may wonder at any feasible (and possibly sequential) extension, that would further keep a coherence under marginalisation (in the number of samples). And also whether or not multiple testing could be coherently envisioned in this setting, for instance when handling all hospitals in the UK. Another consistency question covers the Bayes factor used to assess whether the two distributions behind the samples are or not identical. (One may wonder at the importance of the question, hopefully applied to more relevant dataset than the Iris data!)

parallelizable sampling method for parameter inference of large biochemical reaction models

Posted in Books, Statistics with tags , , , , , , , , on June 18, 2018 by xi'an

I came across this older (2016) arXiv paper by Jan Mikelson and Mustafa Khammash [antidated as of April 25, 2018] as another version of nested sampling. The novelty of the approach is in applying nested sampling for approximating the likelihood function in the case of involved hidden Markov models (although the name itself does not appear in the paper). This is an interesting proposal, even though there is a fairly large and very active literature on computational approaches to such objects, from sequential Monte Carlo (SMC) to particle MCMC (pMCMC), to SMC².

“We found a way to efficiently sample parameter vectors (particles) from the super level set of the likelihood (sets of particles with a likelihood equal to or higher than some threshold) corresponding to an increasing sequence of thresholds” (p.2)

The approach here is an aggregate of nested sampling and particle filters (SMC), filters that are paradoxically employed in approximating the likelihood function itself, thus called repeatedly as the value of the parameter θ changes, unless I am confused, when it seems to me that, once started with particle filters, the authors could have used them all the way to the upper level (through, again, SMC²). Instead, and that brings a further degree of (uncorrected) approximation to the procedure, a Dirichlet process prior is used to estimate Gaussian mixture approximations to the true posterior distribution(s) on the (super) level sets. Now, approximating a distribution that is zero outside a compact set [the prior restricted to the likelihood being larger than by a distribution with an infinite support does not a priori sound like a particularly enticing idea. Note also that there is no later correction for using the mixture approximation to the restricted prior. (The method also involves an approximation of the (Lebesgue) volume of the level sets that may be poor in higher dimensions.)

“DP-GMM estimations work very well in high dimensional spaces and since we use rejection sampling to obtain samples from the level set by sampling from the DP-GMM estimation, the estimation error does not get propagated through iterations.” (p.13)

One aspect of the paper that puzzles me is the use of a rejection sampler to produce new parameters simulations from a given (super) level set, as this involves a lower bound M on the Gaussian mixture approximation over this level set. If a Gaussian mixture approximation is available, there is apparently no need for this as it can be sampled directly and values below the threshold can be disposed of. It is also unclear why the error does not propagate from one level to the next, if only because of the connection between the successive particle approximations.

 

a Ca’Foscari [first Italian-French statistics seminar]

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on October 26, 2017 by xi'an

Apart from subjecting my [surprisingly large!] audience to three hours of ABC tutorial today, and after running Ponte della la Libertà to Mestre and back in a deep fog, I attended the second part of the 1st Italian-French statistics seminar at Ca’Foscari, Venetiarum Universitas, with talks by Stéfano Tonellato and Roberto Casarin. Stéfano discussed a most interesting if puzzling notion of clustering via Dirichlet process mixtures. Which indeed puzzles me for its dependence on the Dirichlet measure and on the potential for an unlimited number of clusters as the sample size increases. The method offers similarities with an approach from our 2000 JASA paper on running inference on mixtures without proper label switching, in that looking at pairs of allocated observations to clusters is revealing about the [true or pseudo-true] number of clusters. With divergence in using eigenvalues of Laplacians on similarity matrices. But because of the potential for the number of components to diverge I wonder at the robustness of the approach via non-parametric [Bayesian] modelling. Maybe my difficulty stands with the very notion of cluster, which I find poorly defined and mostly in the eyes of the beholder! And Roberto presented a recent work on SURE and VAR models, with a great graphical representation of the estimated connections between factors in a sparse graphical model.

%d bloggers like this: