Archive for moments

weakly informative reparameterisations

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , on February 14, 2018 by xi'an

Our paper, weakly informative reparameterisations of location-scale mixtures, with Kaniav Kamary and Kate Lee, got accepted by JCGS! Great news, which comes in perfect timing for Kaniav as she is currently applying for positions. The paper proposes a unidimensional mixture Bayesian modelling based on the first and second moment constraints, since these turn the remainder of the parameter space into a compact. While we had already developed an associated R package, Ultimixt, the current editorial policy of JCGS imposes the R code used to produce all results to be attached to the submission and it took us a few more weeks than it should have to produce a directly executable code, due to internal library incompatibilities. (For this entry, I was looking for a link to our special JCGS issue with my picture of Edinburgh but realised I did not have this picture.)

occupancy rules

Posted in Kids, R, Statistics with tags , , , , , , , on May 23, 2016 by xi'an

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution

\mathbb{E}[Y]=n(1-\frac{1}{n})^m,

[which still has a link with e in that the fraction of empty bins converges to e⁻¹ when n=m], this led me to some more involved investigation on the distribution of Y. While it can be shown directly that the probability that k bins are non-empty is

{n \choose k}\sum_{i=1}^k (-1)^{k-i}{k \choose i}(i/n)^m

with an R representation by

miss<-function(n,m){
p=rep(0,n)
for (k in 1:n)
 p[k]=choose(n,k)*sum((-1)^((k-1):0)*choose(k,1:k)*(1:k)^m)
return(rev(p)/n^m)}

I wanted to take advantage of the moments of Y, since it writes as a sum of n indicators, counting the number of empty cells. However, the higher moments of Y are not as straightforward as its expectation and I struggled with the representation until I came upon this formula

\mathbb{E}[Y^k]=\sum_{i=1}^k {k \choose i} i! S(k,i) \left( 1-\frac{i}{n}\right)^m

where S(k,i) denotes the Stirling number of the second kind… Or i!S(n,i) is the number of surjections from a set of size n to a set of size i. Which leads to the distribution of Y by inverting the moment equations, as in the following R code:

diss<-function(n,m){
  A=matrix(0,n,n)
  mome=rep(0,n)
  A[n,]=rep(1,n)
  mome[n]=1
  for (k in 1:(n-1)){
   A[k,]=(0:(n-1))^k
   for (i in 1:k)
     mome[k]=mome[k]+factorial(i)*as.integer(Stirling2(n,i))*
     (1-(i+1)/n)^m*factorial(k)/factorial(k-i-1)}
  return(solve(A,mome))}

that I still checked by raw simulations from the multinomial

zample<-function(n,m,T=1e4){
  x=matrix(sample(1:n,m*T,rep=TRUE),nrow=T)
  x=sapply(apply(x,1,unique),length)
  return(n-x)}

borderline infinite variance in importance sampling

Posted in Books, Kids, Statistics with tags , , , , , on November 23, 2015 by xi'an

borde1As I was still musing about the posts of last week around infinite variance importance sampling and its potential corrections, I wondered at whether or not there was a fundamental difference between “just” having a [finite] variance and “just” having none. In conjunction with Aki’s post. To get a better feeling, I ran a quick experiment with Exp(1) as the target and Exp(a) as the importance distribution. When estimating E[X]=1, the above graph opposes a=1.95 to a=2.05 (variance versus no variance, bright yellow versus wheat), a=2.95 to a=3.05 (third moment versus none, bright yellow versus wheat), and a=3.95 to a=4.05 (fourth moment versus none, bright yellow versus wheat). The graph below is the same for the estimation of E[exp(X/2)]=2, which has an integrand that is not square integrable under the target. Hence seems to require higher moments for the importance weight. Hard to derive universal theories from those two graphs, however they show that protection against sudden drifts in the estimation sequence. As an aside [not really!], apart from our rather confidential Confidence bands for Brownian motion and applications to Monte Carlo simulation with Wilfrid Kendall and Jean-Michel Marin, I do not know of many studies that consider the sequence of averages time-wise rather than across realisations at a given time and still think this is a more relevant perspective for simulation purposes.

borde2

truncated t’s [typo]

Posted in pictures, Statistics with tags , , , , , on March 14, 2014 by xi'an

Last night, I received this email from Piero Foscari (im Hamburg) about my moment derivations for the absolute and the positive t distribution:

Elben, Hamburg, Feb. 21, 2013There might be two typos in the final second moment formula and its derivation (assuming no silly symmetric mistakes in my validation code): the first ν ought to be -ν, and there should be a corresponding scaling factor also for the boundary μ in Pμ,ν-2 since it arises from a change of variable. Btw in the text reference to Fig. 2 |X| wasn’t updated to X+. I hope that this is of some use.

and I checked that indeed I had forgotten the scale factor ν/(ν-2) in the t distribution with ν-2 degrees of freedom as well as the sign… So I modified the note and rearXived it. Sorry about this lack of attention to the derivation!

back to moments

Posted in Statistics, University life with tags , , , on March 23, 2012 by xi'an

A recent paper posted on arXiv considers afresh the method of moments for mixtures of distributions. (“Afresh”, because the method was introduced by Karl Pearson in the 1890’s…) The authors (Animashree Anandkumar, Daniel Hsu, and Sham Kakade) estimate the parameters of a mixture of multinomial distributions (motivated as a “bag of words document topic” model) via the moment representation of pairwise and triple-wise probabilities. The estimate is obtained by a simple matricial formula using the empirical frequencies for pairs and triplets. The principle also applies for non-multinomial mixtures with components that are defined/parameterised by their mean (or rather first moments?), like Gaussian mixtures.

This is neat, but there are a few caveats: (1) contrary to standard mixtures, the paper assumes that þ observations are made at once from a given component: in other words, components are drawn at random according to a multinomial distribution, then þ observations are generated from this given component. (This is rather unusual, esp. given that þ is the same across all samples. It should be feasible to extend the results in the paper to varying þ‘s…) (2) while the pairwise and triplewise statistics remain low order moments, avoiding the criticism raised against Pearson’s original estimator, those pairwise and even more triplewise frequency estimators are quickly getting poor as the number d of words in the vocabulary/dimension of the parameter increases, since there should be more and more zeros. (For a D dimensional Gaussian mixture with both mean and covariance matrix unknown, the authors consider the dimension is D/þ but this seems strange given the D+D²/2 parameters to estimate for each component…)