Archive for normal model

a neat EM resolution

Posted in Books, Kids, Statistics, University life with tags , , , , , , on February 3, 2021 by xi'an

Read (and answered) this question on X validation about finding the maximum likelihood estimator of a 2×2 Gaussian covariance matrix when some observations are partly missing.  The neat thing is that, in this case, the maximisation step is identical to the maximum likelihood estimation of the 2×2 Gaussian covariance matrix by redefining the empirical covariance matrix into Z and maximising

-n\log|\Sigma|-\text{trace}(Z\Sigma^{-1})

in Σ. Nothing involved but fun to explain, nonetheless. (In my final exam this year, no student even approached the EM questions!)

visualising bias and unbiasedness

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , on April 29, 2019 by xi'an

A question on X validated led me to wonder at the point made by Christopher Bishop in his Pattern Recognition and Machine Learning book about the MLE of the Normal variance being biased. As it is illustrated by the above graph that opposes the true and green distribution of the data (made of two points) against the estimated and red distribution. While it is true that the MLE under-estimates the variance on average, the pictures are cartoonist caricatures in their deviance permanence across three replicas. When looking at 10⁵ replicas, rather than three, and at samples of size 10, rather than 2, the distinction between using the MLE (left) and the unbiased estimator of σ² (right).

When looking more specifically at the case n=2, the humongous variability of the density estimate completely dwarfs the bias issue:

Even when averaging over all 10⁵ replications, the difference is hard to spot (and both estimations are more dispersed than the truth!):

Bayesian GANs [#2]

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

As an illustration of the lack of convergence of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a logistic discriminator (one parameter) and implemented the scheme (during an ISBA 2018 session). With flat priors on both parameters. And a Normal random walk as Metropolis-Hastings proposal. As expected, since there is no stationary distribution associated with the Markov chain, simulated chains do not exhibit a stationary pattern,

And they eventually reach an overflow error or a trapping state as the log-likelihood gets approximately to zero (red curve).

Too bad I missed the talk by Shakir Mohammed yesterday, being stuck on the Edinburgh by-pass at rush hour!, as I would have loved to hear his views about this rather essential issue…

copy code at your own peril

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on November 14, 2016 by xi'an

I have come several times upon cases of scientists [I mean, real, recognised, publishing, senior scientists!] from other fields blindly copying MCMC code from a paper or website, and expecting the program to operate on their own problem… One illustration is from last week, when I read a X Validated question [from 2013] about an attempt of that kind, on a rather standard Normal posterior, but using an R code where the posterior function was not even defined. (I foolishly replied, despite having no expectation of a reply from the person asking the question.)

an ABC experiment

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , on November 24, 2014 by xi'an

 

ABCmadIn a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm:

#normal data with 100 observations
n=100
x=rnorm(n)
#observed summaries
sumx=c(median(x),mad(x))

#normal x gamma prior
priori=function(N){
 return(cbind(rnorm(N,sd=10),
  1/sqrt(rgamma(N,shape=2,scale=5))))
}

ABC=function(N,alpha=.05){

  prior=priori(N) #reference table

  #pseudo-data
  summ=matrix(0,N,2)
  for (i in 1:N){
   xi=rnorm(n)*prior[i,2]+prior[i,1]
   summ[i,]=c(median(xi),mad(xi)) #summaries
   }

  #normalisation factor for the distance
  mads=c(mad(summ[,1]),mad(summ[,2]))
  #distance
  dist=(abs(sumx[1]-summ[,1])/mads[1])+
   (abs(sumx[2]-summ[,2])/mads[2])
  #selection
  posterior=prior[dist<quantile(dist,alpha),]}

Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than when using the mean and the variance as summary statistics. This is not completely surprising in that the latter are sufficient, while the former are not. Still, the (-10,10) range on the mean is way larger… The second reason for surprise is that the true posterior distribution cannot be derived since the joint density of med and mad is unavailable.

sufvsinsufAfter thinking about this for a while, I went back to my workbench to check the difference with using mean and variance. To my greater surprise, I found hardly any difference! Using the almost exact ABC with 10⁶ simulations and a 5% subsampling rate returns exactly the same outcome. (The first row above is for the sufficient statistics (mean,standard deviation) while the second row is for the (median,mad) pair.) Playing with the distance does not help. The genuine posterior output is quite different, as exposed on the last row of the above, using a basic Gibbs sampler since the posterior is not truly conjugate.