Archive for mixture of distributions

composition versus inversion

Posted in Books, Kids, R, Statistics with tags , , , , , , , on March 31, 2021 by xi'an

While trying to convey to an OP on X validated why the inversion method was not always the panacea in pseudo-random generation, I took the example of a mixture of K exponential distributions when K is very large, in order to impress (?) upon said OP that solving F(x)=u for such a closed-form cdf F was very costly even when using a state-of-the-art (?) inversion algorithm, like uniroot, since each step involves adding the K terms in the cdf. Selecting the component from the cumulative distribution function on the component proves to be quite fast since using the rather crude

x=rexp(1,lambda[1+sum(runif(1)>wes)])

brings a 100-fold improvement over

Q = function(u) uniroot((function(x) F(x) - u), lower = 0, 
    upper = qexp(.999,rate=min(la)))[1] #numerical tail quantile
x=Q(runif(1))

when K=10⁵, as shown by a benchmark call

         test elapsed
1       compo   0.057
2      Newton  45.736
3     uniroot   5.814

where Newton denotes a simple-minded Newton inversion. I wonder if there is a faster way to select the component in the mixture. Using a while loop starting from the most likely components proves to be much slower. And accept-reject solutions are invariably slow or fail to work with such a large number of components. Devroye’s Bible has a section (XIV.7.5) on simulating sums of variates from an infinite mixture distribution, but, for once,  nothing really helpful. And another section (IV.5) on series methods, where again I could not find a direct connection.

population quasi-Monte Carlo

Posted in Books, Statistics with tags , , , , , , , , , , , , on January 28, 2021 by xi'an

“Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which utilizes a population of proposals to generate weighted samples that approximate the target distribution”

A return of the prodigal son!, with this arXival by Huang, Joseph, and Mak, of a paper on population Monte Carlo using quasi-random sequences. The construct is based on an earlier notion of Joseph and Mak, support points, which are defined wrt a given target distribution F as minimising the variability of a sample from F away from these points. (I would have used instead my late friend Bernhard Flury’s principal points!) The proposal uses Owen-style scrambled Sobol points, followed by a deterministic mixture weighting à la PMC, followed by importance support resampling to find the next location parameters of the proposal mixture (which is why I included an unrelated mixture surface as my post picture!). This importance support resampling is obviously less variable than the more traditional ways of resampling but the cost moves from O(M) to O(M²).

“The main computational complexity of the algorithm is O(M²) from computing the pairwise distance of the M weighted samples”

The covariance parameters are updated as in our 2008 paper. This new proposal is interesting and reasonable, with apparent significant gains, albeit I would have liked to see a clearer discussion of the actual computing costs of PQMC.

the strange occurrence of the one bump

Posted in Books, Kids, R, Statistics with tags , , , , , , , , on June 8, 2020 by xi'an

When answering an X validated question on running an accept-reject algorithm for the Gamma distribution by using a mixture of Beta and drifted (bt 1) Exponential distributions, I came across the above glitch in the fit of my 10⁷ simulated sample to the target, apparently displaying a wrong proportion of simulations above (or below) one.

a=.9
g<-function(T){
  x=rexp(T)
  v=rt(T,1)<0
  x=c(1+x[v],exp(-x/a)[!v])
  x[runif(T)<x^a/x/exp(x)/((x>1)*exp(1-x)+a*(x<1)*x^a/x)*a]}

It took me a while to spot the issue, namely that the output of

  z=g(T)
  while(sum(!!z)<T)z=c(z,g(T))
  z[1:T]

was favouring simulations from the drifted exponential by truncating. Permuting the elements of z before returning solved the issue (as shown below for a=½)!

MiMo2020

Posted in Statistics with tags , , , , , , , , on January 24, 2020 by xi'an

On 26 and 27 March 2020, the maths department of the Université of Rouen, Normandy, France, organizes a (free) workshop on mixture distributions. With the following speakers

    • Christophe Biernacki  (Laboratoire Paul Painlevé, Univ. Lille 1 et INRIA)
    • Vincent Brault (Laboratoire Jean Kuntzmann, Univ. Grenoble Alpes)
    • Gilles Celeux  (Laboratoire de Mathématiques d’Orsay, Univ. Paris Sud et INRIA)
    • Elisabeth Gassiat  (Laboratoire de Mathématiques d’Orsay, Univ. Paris Sud)
    • Van Hà Hoang  (Laboratoire de Mathématique Raphaël Salem, Univ. Rouen Normandie)
    • Hajo Holzmann  (Philipps-University Marburg, Germany)
    • Dimitri Karlis  (Department of Statistics, Athens University of Economics and Business, Greece)
    • Trung Tin Nguyen (LMNO, Univ. Caen Normandie)
    • Andrea Rau  (Département de Génétique Animale, INRA, Jouy en Josas)
    • Pierre Vandekerkhove  (Laboratoire d’Analyse et de Mathématiques Appliquées, Univ. Paris-Est Marne-la-Vallée)
    • Cinzia Viroli  (Department of Statistical Sciences, Universita di Bologna, Italia)

Unfortunately, since this is my former department, I will not be able to attend as I am taking part into the SIAM Conference on Uncertainty Quantification (UQ20), on the very same days. In a session on likelihood-free inference.

the three i’s of poverty

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on September 15, 2019 by xi'an

Today I made a “quick” (10h door to door!) round trip visit to Marseille (by train) to take part in the PhD thesis defense (committee) of Edwin Fourrier-Nicolaï, which title was Poverty, inequality and redistribution: an econometric approach. While this was mainly a thesis in economics, meaning defending some theory on inequalities based on East German data, there were Bayesian components in the thesis that justified (to some extent!) my presence in the jury. Especially around mixture estimation by Gibbs sampling. (On which I started working almost exactly 30 years ago, when I joined Paris 6 and met  Gilles Celeux and Jean Diebolt.) One intriguing [for me] question stemmed from this defense, namely the notion of a Bayesian estimation of a three i’s of poverty (TIP) curve. The three i’s stand for incidence, intensity, and inequality, as, introduced in Jenkins and Lambert (1997), this curve measure the average income loss from the poverty level for the 100p% lower incomes, when p varies between 0 and 1. It thus depends on the distribution F of the incomes and when using a mixture distribution its computation requires a numerical cdf inversion to determine the income p-th quantile. A related question is thus on how to define a Bayesian estimate of the TIP curve. Using an average over the values of an MCMC sample does not sound absolutely satisfactory since the upper bound in the integral varies for each realisation of the parameter. The use of another estimate would however require a specific loss function, an issue not discussed in the thesis.