Archive for mixture estimation

Bill’s 80th!!!

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on April 17, 2022 by xi'an

“It was the best of times,
it was the worst of times”
[Dickens’ Tale of Two Cities (which plays a role in my friendship with Bill!)]

My flight to NYC last week was uneventful and rather fast and I worked rather well, even though the seat in front of me was inclined to the max for the entire flight! (Still got glimpses of Aline and of Deepwater Horizon from my neighbours.) Taking a very early flight from Paris was great making a full day once in NYC,  but “forcing” me to take a taxi, which almost ended up in disaster since the Über driver did not show up. At all. And never replied to my message. Fortunately trains were running, I was also running despite the broken rib, and I arrived at the airport some time before access was closed, grateful for the low activity that day. I also had another bit of a worrying moment at the US border control in JFK as I ended up in a back-office of the Border Police after the machine could not catch my fingerprints. And another stop at the luggage control as my lack of luggage sounded suspicious!The conference was delightful in celebrating Bill’s carreer and kindness (tinted with the most gentle irony!). Among stories told at the banquet, I was surprised to learn of Bill’s jazz career side, as I had never heard him play the piano or the clarinet! Even though we had chatted about music and literature on many occasions. Since our meeting in 1989… The (scientific side of the) conference included many talks around shrinkage, from loss estimation to predictive estimation, reminding me of the roaring 70’s and 80’s [James-Stein wise]. And demonstrating the impact of Bill’s wor throughout this era (incl. on my own PhD thesis). I started wondering at the (Bayesian) use of the loss estimate, though, as I set myself facing two point estimators attached with two estimators of their loss: it did not seem a particularly good idea to systematically pick the one with the smallest estimate (and Jim Berger confirmed this feeling on a later discussion). Among the talks on less familiar topics (of mine), I discovered work of Genevera Allen‘s on inferring massive network for neuron connections under sparse information. And of Emma Jingfei Zhang, equally centred on network inference, with applications to brain connectivity.

In a somewhat remote connection with Bill’s work (and our joint and hilarious assessment of Pitman closeness), I presented part of our joint and current work with Adrien Hairault and Judith Rousseau on inferring the number of components in a mixture by Bayes factors when the alternative is an infinite mixture (i.e., a Dirichlet process mixture). Of which Ruobin Gong gave a terrific discussion. (With a connection to her current work on Sense and Sensitivity.)

I was most sorry to miss Larry Wasserman’s and Rob Strawderman’s talk to rush back to the airport, the more because I am sure Larry’s talk would have brought a new light on causality (possibly equating it with tequila and mixtures!). The flight back was uneventfull, the plane rather empty and I slept most of the time. Overall,  it was most wonderful to re-connect with so many friends. Most of whom I had not seen for ages, even before the pandemic. And to meet new friends. (Nothing original in the reported feeling, just telling that the break in conferences and workshops was primarily a hatchet job on social relations and friendships.)

Bruce Lindsay (March 7, 1947 — May 5, 2015)

Posted in Books, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 22, 2015 by xi'an

When early registering for Seattle (JSM 2015) today, I discovered on the ASA webpage the very sad news that Bruce Lindsay had passed away on May 5.  While Bruce was not a very close friend, we had met and interacted enough times for me to feel quite strongly about his most untimely death. Bruce was indeed “Mister mixtures” in many ways and I have always admired the unusual and innovative ways he had found for analysing mixtures. Including algebraic ones through the rank of associated matrices. Which is why I first met him—besides a few words at the 1989 Gertrude Cox (first) scholarship race in Washington DC—at the workshop I organised with Gilles Celeux and Mike West in Aussois, French Alps, in 1995. After this meeting, we met twice in Edinburgh at ICMS workshops on mixtures, organised with Mike Titterington. I remember sitting next to Bruce at one workshop dinner (at Blonde) and him talking about his childhood in Oregon and his father being a journalist and how this induced him to become an academic. He also contributed a chapter on estimating the number of components [of a mixture] to the Wiley book we edited out of this workshop. Obviously, his work extended beyond mixtures to a general neo-Fisherian theory of likelihood inference. (Bruce was certainly not a Bayesian!) Last time, I met him, it was in Italia, at a likelihood workshop in Venezia, October 2012, mixing Bayesian nonparametrics, intractable likelihoods, and pseudo-likelihoods. He gave a survey talk about composite likelihood, telling me about his extended stay in Italy (Padua?) around that time… So, Bruce, I hope you are now running great marathons in a place so full of mixtures that you can always keep ahead of the pack! Fare well!

 

Le Monde puzzle [#902]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on March 8, 2015 by xi'an

Another arithmetics Le Monde mathematical puzzle:

From the set of the integers between 1 and 15, is it possible to partition it in such a way that the product of the terms in the first set is equal to the sum of the members of the second set? can this be generalised to an arbitrary set {1,2,..,n}? What happens if instead we only consider the odd integers in those sets?.

I used brute force by looking at random for a solution,

pb <- txtProgressBar(min = 0, max = 100, style = 3)
for (N in 5:100){
sol=FALSE
while (!sol){
  k=sample(1:N,1,prob=(1:N)*(N-(1:N)))
  pro=sample(1:N,k)
  sol=(prod(pro)==sum((1:N)[-pro]))
}
setTxtProgressBar(pb, N)}
close(pb)

and while it took a while to run the R code, it eventually got out of the loop, meaning there was at least one solution for all n’s between 5 and 100. (It does not work for n=1,2,3,4, for obvious reasons.) For instance, when n=15, the integers in the product part are either 3,5,7, 1,7,14, or 1,9,11. Jean-Louis Fouley sent me an explanation:  when n is odd, n=2p+1, one solution is (1,p,2p), while when n is even, n=2p, one solution is (1,p-1,2p).

A side remark on the R code: thanks to a Cross Validated question by Paulo Marques, on which I thought I had commented on this blog, I learned about the progress bar function in R, setTxtProgressBar(), which makes running R code with loops much nicer!

For the second question, I just adapted the R code to exclude even integers:

while (!sol){
  k=1+trunc(sample(1:N,1)/2)
  pro=sample(seq(1,N,by=2),k)
  cum=(1:N)[-pro]
  sol=(prod(pro)==sum(cum[cum%%2==1]))
}

and found a solution for n=15, namely 1,3,15 versus 5,7,9,11,13. However, there does not seem to be a solution for all n’s: I found solutions for n=15,21,23,31,39,41,47,49,55,59,63,71,75,79,87,95…

Overfitting Bayesian mixture models with an unknown number of components

Posted in Statistics with tags , , , , , , , , on March 4, 2015 by xi'an

During my Czech vacations, Zoé van Havre, Nicole White, Judith Rousseau, and Kerrie Mengersen1 posted on arXiv a paper on overfitting mixture models to estimate the number of components. This is directly related with Judith and Kerrie’s 2011 paper and with Zoé’s PhD topic. The paper also returns to the vexing (?) issue of label switching! I very much like the paper and not only because the author are good friends!, but also because it brings a solution to an approach I briefly attempted with Marie-Anne Gruet in the early 1990’s, just before finding about the reversible jump MCMC algorithm of Peter Green at a workshop in Luminy and considering we were not going to “beat the competition”! Hence not publishing the output of our over-fitted Gibbs samplers that were nicely emptying extra components… It also brings a rebuke about a later assertion of mine’s at an ICMS workshop on mixtures, where I defended the notion that over-fitted mixtures could not be detected, a notion that was severely disputed by David McKay…

What is so fantastic in Rousseau and Mengersen (2011) is that a simple constraint on the Dirichlet prior on the mixture weights suffices to guarantee that asymptotically superfluous components will empty out and signal they are truly superfluous! The authors here cumulate the over-fitted mixture with a tempering strategy, which seems somewhat redundant, the number of extra components being a sort of temperature, but eliminates the need for fragile RJMCMC steps. Label switching is obviously even more of an issue with a larger number of components and identifying empty components seems to require a lack of label switching for some components to remain empty!

When reading through the paper, I came upon the condition that only the priors of the weights are allowed to vary between temperatures. Distinguishing the weights from the other parameters does make perfect sense, as some representations of a mixture work without those weights. Still I feel a bit uncertain about the fixed prior constraint, even though I can see the rationale in not allowing for complete freedom in picking those priors. More fundamentally, I am less and less happy with independent identical or exchangeable priors on the components.

Our own recent experience with almost zero weights mixtures (and with Judith, Kaniav, and Kerrie) suggests not using solely a Gibbs sampler there as it shows poor mixing. And even poorer label switching. The current paper does not seem to meet the same difficulties, maybe thanks to (prior) tempering.

The paper proposes a strategy called Zswitch to resolve label switching, which amounts to identify a MAP for each possible number of components and a subsequent relabelling. Even though I do not entirely understand the way the permutation is constructed. I wonder in particular at the cost of the relabelling.

Robert’s paradox [reading in Reading]

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on January 28, 2015 by xi'an

paradoxOn Wednesday afternoon, Richard Everitt and Dennis Prangle organised an RSS workshop in Reading on Bayesian Computation. And invited me to give a talk there, along with John Hemmings, Christophe Andrieu, Marcelo Pereyra, and themselves. Given the proximity between Oxford and Reading, this felt like a neighbourly visit, especially when I realised I could take my bike on the train! John Hemmings gave a presentation on synthetic models for climate change and their evaluation, which could have some connection with Tony O’Hagan’s recent talk in Warwick, Dennis told us about “the lazier ABC” version in connection with his “lazy ABC” paper, [from my very personal view] Marcelo expanded on the Moreau-Yoshida expansion he had presented in Bristol about six months ago, with the notion that using a Gaussian tail regularisation of a super-Gaussian target in a Langevin algorithm could produce better convergence guarantees than the competition, including Hamiltonian Monte Carlo, Luke Kelly spoke about an extension of phylogenetic trees using a notion of lateral transfer, and Richard introduced a notion of biased approximation to Metropolis-Hasting acceptance ratios, notion that I found quite attractive if not completely formalised, as there should be a Monte Carlo equivalent to the improvement brought by biased Bayes estimators over unbiased classical counterparts. (Repeating a remark by Persi Diaconis made more than 20 years ago.) Christophe Andrieu also exposed some recent developments of his on exact approximations à la Andrieu and Roberts (2009).

Since those developments are not yet finalised into an archived document, I will not delve into the details, but I found the results quite impressive and worth exploring, so I am looking forward to the incoming publication. One aspect of the talk which I can comment on is related to the exchange algorithm of Murray et al. (2006). Let me recall that this algorithm handles double intractable problems (i.e., likelihoods with intractable normalising constants like the Ising model), by introducing auxiliary variables with the same distribution as the data given the new value of the parameter and computing an augmented acceptance ratio which expectation is the targeted acceptance ratio and which conveniently removes the unknown normalising constants. This auxiliary scheme produces a random acceptance ratio and hence differs from the exact-approximation MCMC approach, which target directly the intractable likelihood. It somewhat replaces the unknown constant with the density taken at a plausible realisation, hence providing a proper scale. At least for the new value. I wonder if a comparison has been conducted between both versions, the naïve intuition being that the ratio of estimates should be more variable than the estimate of the ratio. More generally, it seemed to me [during the introductory part of Christophe’s talk] that those different methods always faced a harmonic mean danger when being phrased as expectations of ratios, since those ratios were not necessarily squared integrable. And not necessarily bounded. Hence my rather gratuitous suggestion of using other tools than the expectation, like maybe a median, thus circling back to the biased estimators of Richard. (And later cycling back, unscathed, to Reading station!)

On top of the six talks in the afternoon, there was a small poster session during the tea break, where I met Garth Holloway, working in agricultural economics, who happened to be a (unsuspected) fan of mine!, to the point of entitling his poster “Robert’s paradox”!!! The problem covered by this undeserved denomination connected to the bias in Chib’s approximation of the evidence in mixture estimation, a phenomenon that I related to the exchangeability of the component parameters in an earlier paper or set of slides. So “my” paradox is essentially label (un)switching and its consequences. For which I cannot claim any fame! Still, I am looking forward the completed version of this poster to discuss Garth’s solution, but we had a beer together after the talks, drinking to the health of our mutual friend John Deely.

%d bloggers like this: