After a rather long interlude, and just in time for the six month deadline!, we (Gilles Celeux, Mohammed El Anbari, Jean-Michel Marin and myself) have resubmitted (and rearXived) our comparative study of Bayesian and non-Bayesian variable selections procedures to Bayesian Analysis. Why it took us so long is a combination of good and bad reasons: besides being far apart, between Morocco, Paris and Montpellier, and running too many projects at once with Jean-Michel (including the Bayesian Core revision that did not move much since last summer!), we came to realise that my earlier strong stance that invariance on the intercept did not matter was not right and that the (kind) reviewers were correct about the asymptotic impact of the scale of the intercept on the variable selection, so we had first to reconvene and think about it, before running another large round of simulations. We hope the picture is now clearer.
Archive for invariance
On Monday, I took part in the jury of the PhD thesis of Anabel Forte Deltel, in the department of statistics of the Universitat de València. The topic of the thesis was variable selection in Gaussian linear models using an objective Bayes approach. Completely on my own research agenda! I had already discussed with Anabel in Zürich, where she gave a poster and gave me a copy of her thesis, so could concentrate on the fundamentals of her approach during the defense. Her approach extends Liang et al. (2008, JASA) hyper-g prior in a complete analysis of the conditions set by Jeffreys in his book for constructing such priors. She is therefore able to motivate a precise value for most hyperparameters (although some choices were mainly based on computational reasons opposing 2F1 with Appell’s F1 hypergeometric functions). She also defends the use of an improper prior by an invariance argument that leads to the standard Jeffreys’ prior on location-scale. (This is where I prefer the approach in Bayesian Core that does not discriminate between a subset of the covariates including the intercept and the other covariates. Even though it is not invariant by location-scale transforms.) After the defence, Jim Berger pointed out to me that the modelling allowed for the subset to be empty, which would then cancel my above objection! In conclusion, this thesis could well set a reference prior (if not in José Bernardo’s sense of the term!) for Bayesian linear model analysis in the coming years.
The conference in honour of Larry Brown was quite exciting, with lots of old friends gathered in Philadelphia and lots of great talks either recollecting major works of Larry and coauthors or presenting fairly interesting new works. Unsurprisingly, a large chunk of the talks was about admissibility and minimaxity, with John Hartigan starting the day re-reading Larry masterpiece 1971 paper linking admissibility and recurrence of associated processes, a paper I always had trouble studying because of both its depth and its breadth! Bill Strawderman presented a new if classical minimaxity result on matrix estimation and Anirban DasGupta some large dimension consistency results where the choice of the distance (total variation versus Kullback deviance) was irrelevant. Ed George and Susie Bayarri both presented their recent work on g-priors and their generalisation, which directly relate to our recent paper on that topic. On the afternoon, Holger Dette showed some impressive mathematics based on Elfving’s representation and used in building optimal designs. I particularly appreciated the results of a joint work with Larry presented by Robert Wolpert where they classified all Markov stationary infinitely divisible time-reversible integer-valued processes. It produced a surprisingly small list of four cases, two being trivial.. The final talk of the day was about homology, which sounded a priori rebutting, but Robert Adler made it extremely entertaining, so much that I even failed to resent the powerpoint tricks! The next morning, Mark Low gave a very emotional but also quite illuminating about the first results he got during his PhD thesis at Cornell (completing the thesis when I was using Larry’s office!). Brenda McGibbon went back to the three truncated Poisson papers she wrote with Ian Johnstone (via gruesome 13 hour bus rides from Montréal to Ithaca!) and produced an illuminating explanation of the maths at work for moving from the Gaussian to the Poisson case in a most pedagogical and enjoyable fashion. Larry Wasserman explained the concepts at work behind the lasso for graphs, entertaining us with witty acronyms on the side!, and leaving out about 3/4 of his slides! (The research group involved in this project produced an R package called huge.) Joe Eaton ended up the morning with a very interesting result showing that using the right Haar measure as a prior leads to a matching prior, then showing why the consequences of the result are limited by invariance itself. Unfortunately, it was then time for me to leave and I will miss (in both meanings of the term) the other half of the talks. Especially missing Steve Fienberg’s talk for the third time in three weeks! Again, what I appreciated most during those two days (besides the fact that we were all reunited on the very day of Larry’s birthday!) was the pain most speakers went to to expose older results in a most synthetic and intuitive manner… I also got new ideas about generalising our parallel computing paper for random walk Metropolis-Hastings algorithms and for optimising across permutation transforms.
A new arXived note by Berger and Hill discusses how [my favourite probability introduction] Feller’s Introduction to Probability Theory (volume 2) gets Benford’s Law “wrong”. While my interest in Benford’s Law is rather superficial, I find the paper of interest as it shows a confusion between different folk theorems! My interpretation of Benford’s Law is that the first significant digit of a random variable (in a basis 10 representation) is distributed as
and not that is uniform, which is the presentation given in the arXived note…. The former is also the interpretation of William Feller (page 63, Introduction to Probability Theory), contrary to what the arXived note seems to imply on page 2, but Feller indeed mentioned as an informal/heuristic argument in favour of Benford’s Law that when the spread of the rv X is large, is approximately uniformly distributed. (I would no call this a “fundamental flaw“.) The arXived note is then right in pointing out the lack of foundation for Feller’s heuristic, if muddling the issue by defining several non-equivalent versions of Benford’s Law. It is also funny that this arXived note picks at the scale-invariant characterisation of Benford’s Law when Terry Tao’s entry represents it as a special case of Haar measure!
In connection with an earlier post on Benford’s Law, i.e. the probability that the first digit of a random variable X isis approximately—you can easily check that the sum of those probabilities is 1—, I want to signal a recent entry on Terry Tiao’s impressive blog. Terry points out that Benford’s Law is the Haar measure in that setting, but he also highlights a very peculiar absorbing property which is that, iffollows Benford’s Law, thenalso follows Benford’s Law for any random variablethat is independent from… Now, the funny thing is that, if you take a normal sampleand check whether or not Benford’s Law applies to this sample, it does not. But if you take a second normal sampleand consider the product sample, then Benford’s Law applies almost exactly. If you repeat the process one more time, it is difficult to spot the difference. Here is the [rudimentary—there must be a more elegant way to get the first significant digit!] R code to check this:
x=abs(rnorm(10^6)) b=trunc(log10(x)) -(log(x)<0) plot(hist(trunc(x/10^b),breaks=(0:9)+.5)$den,log10((2:10)/(1:9)), xlab="Frequency",ylab="Benford's Law",pch=19,col="steelblue") abline(a=0,b=1,col="tomato",lwd=2) x=abs(rnorm(10^6)*x) b=trunc(log10(x)) -(log(x)<0) points(hist(trunc(x/10^b),breaks=(0:9)+.5,plot=F)$den,log10((2:10)/(1:9)), pch=19,col="steelblue2") x=abs(rnorm(10^6)*x) b=trunc(log10(x)) -(log(x)<0) points(hist(trunc(x/10^b),breaks=(0:9)+.5,plot=F)$den,log10((2:10)/(1:9)), pch=19,col="steelblue3")
Even better, if you change rnorm to another generator like rcauchy or rexp at any of the three stages, the same pattern occurs.