Archive for mixture of distributions

adaptive incremental mixture MCMC

Posted in Statistics with tags , , , , , , , on August 12, 2022 by xi'an

Sadly, I missed this adaptive incremental mixture MCMC paper by my friends Florian Maire, Nial Friel, Antonietta Mira, and Adrian E. Raftery when it came out in JCGS in 2019. The core of the paper is about building a time-inhomogeneous mixture independent proposal, starting from an initial distribution and adding one component when hitting a point for which the ratio target / proposal is large, as this points out a part of the space that is not well-enough explored, while the other components do not change, except for a proportional decrease in the weights. This proposal reminded me of the inspiring paper of Gåsemyr (2003), which in some ways inspired our population Monte Carlo sampler. Obviously, there is a what-you-get-is-what-you-see drawback to the approach in that regions where this ratio is high may never be explored by the proposal, despite its adaptivity.

The added component is Normal, centred at the associated (accepted) proposed value ø and with covariance matrix a local estimate based on past iterations of the algorithm. And with weight proportional to the (powered) target density at ø, which does not require a normalising constant. The method however requires setting a certain number of calibration parameters like the power γ for the weight, the lower bound M for the ratio target to proposal, the rate of diminishing adaptation (which is also needed for ergodicity à la Roberts and Rosenthal (2007)).  And the implicit choice of a particular parameterisation for the Normal mixture to be close enough to the target. In the posted experiments, the number of components in the mixture does not grow to unmanageable figures, but a further adaption could be in removing components that are inactive or leading to systematic rejection as we did in the population Monte Carlo paper.

composition versus inversion

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 31, 2021 by xi'an

While trying to convey to an OP on X validated why the inversion method was not always the panacea in pseudo-random generation, I took the example of a mixture of K exponential distributions when K is very large, in order to impress (?) upon said OP that solving F(x)=u for such a closed-form cdf F was very costly even when using a state-of-the-art (?) inversion algorithm, like uniroot, since each step involves adding the K terms in the cdf. Selecting the component from the cumulative distribution function on the component proves to be quite fast since using the rather crude

x=rexp(1,lambda[1+sum(runif(1)>wes)])

brings a 100-fold improvement over

Q = function(u) uniroot((function(x) F(x) - u), lower = 0, 
    upper = qexp(.999,rate=min(la)))[1] #numerical tail quantile
x=Q(runif(1))

when K=10⁵, as shown by a benchmark call

         test elapsed
1       compo   0.057
2      Newton  45.736
3     uniroot   5.814

where Newton denotes a simple-minded Newton inversion. I wonder if there is a faster way to select the component in the mixture. Using a while loop starting from the most likely components proves to be much slower. And accept-reject solutions are invariably slow or fail to work with such a large number of components. Devroye’s Bible has a section (XIV.7.5) on simulating sums of variates from an infinite mixture distribution, but, for once,  nothing really helpful. And another section (IV.5) on series methods, where again I could not find a direct connection.

population quasi-Monte Carlo

Posted in Books, Statistics with tags , , , , , , , , , , , , on January 28, 2021 by xi'an

“Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which utilizes a population of proposals to generate weighted samples that approximate the target distribution”

A return of the prodigal son!, with this arXival by Huang, Joseph, and Mak, of a paper on population Monte Carlo using quasi-random sequences. The construct is based on an earlier notion of Joseph and Mak, support points, which are defined wrt a given target distribution F as minimising the variability of a sample from F away from these points. (I would have used instead my late friend Bernhard Flury’s principal points!) The proposal uses Owen-style scrambled Sobol points, followed by a deterministic mixture weighting à la PMC, followed by importance support resampling to find the next location parameters of the proposal mixture (which is why I included an unrelated mixture surface as my post picture!). This importance support resampling is obviously less variable than the more traditional ways of resampling but the cost moves from O(M) to O(M²).

“The main computational complexity of the algorithm is O(M²) from computing the pairwise distance of the M weighted samples”

The covariance parameters are updated as in our 2008 paper. This new proposal is interesting and reasonable, with apparent significant gains, albeit I would have liked to see a clearer discussion of the actual computing costs of PQMC.

the strange occurrence of the one bump

Posted in Books, Kids, R, Statistics with tags , , , , , , , , on June 8, 2020 by xi'an

When answering an X validated question on running an accept-reject algorithm for the Gamma distribution by using a mixture of Beta and drifted (bt 1) Exponential distributions, I came across the above glitch in the fit of my 10⁷ simulated sample to the target, apparently displaying a wrong proportion of simulations above (or below) one.

a=.9
g<-function(T){
  x=rexp(T)
  v=rt(T,1)<0
  x=c(1+x[v],exp(-x/a)[!v])
  x[runif(T)<x^a/x/exp(x)/((x>1)*exp(1-x)+a*(x<1)*x^a/x)*a]}

It took me a while to spot the issue, namely that the output of

  z=g(T)
  while(sum(!!z)<T)z=c(z,g(T))
  z[1:T]

was favouring simulations from the drifted exponential by truncating. Permuting the elements of z before returning solved the issue (as shown below for a=½)!

MiMo2020

Posted in Statistics with tags , , , , , , , , on January 24, 2020 by xi'an

On 26 and 27 March 2020, the maths department of the Université of Rouen, Normandy, France, organizes a (free) workshop on mixture distributions. With the following speakers

    • Christophe Biernacki  (Laboratoire Paul Painlevé, Univ. Lille 1 et INRIA)
    • Vincent Brault (Laboratoire Jean Kuntzmann, Univ. Grenoble Alpes)
    • Gilles Celeux  (Laboratoire de Mathématiques d’Orsay, Univ. Paris Sud et INRIA)
    • Elisabeth Gassiat  (Laboratoire de Mathématiques d’Orsay, Univ. Paris Sud)
    • Van Hà Hoang  (Laboratoire de Mathématique Raphaël Salem, Univ. Rouen Normandie)
    • Hajo Holzmann  (Philipps-University Marburg, Germany)
    • Dimitri Karlis  (Department of Statistics, Athens University of Economics and Business, Greece)
    • Trung Tin Nguyen (LMNO, Univ. Caen Normandie)
    • Andrea Rau  (Département de Génétique Animale, INRA, Jouy en Josas)
    • Pierre Vandekerkhove  (Laboratoire d’Analyse et de Mathématiques Appliquées, Univ. Paris-Est Marne-la-Vallée)
    • Cinzia Viroli  (Department of Statistical Sciences, Universita di Bologna, Italia)

Unfortunately, since this is my former department, I will not be able to attend as I am taking part into the SIAM Conference on Uncertainty Quantification (UQ20), on the very same days. In a session on likelihood-free inference.

%d bloggers like this: