## mea culpa!

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , on October 9, 2017 by xi'an

An entry about our Bayesian Essentials book on X validated alerted me to a typo in the derivation of the Gaussian posterior..! When deriving the posterior (which was left as an exercise in the Bayesian Core), I just forgot the term expressing the divergence between the prior mean and the sample mean. Mea culpa!!!

## ghost [parameters] in the [Bayesian] shell

Posted in Books, Kids, Statistics with tags , , , , , , , on August 3, 2017 by xi'an

This question appeared on Stack Exchange (X Validated) two days ago. And the equalities indeed seem to suffer from several mathematical inconsistencies, as I pointed out in my Answer. However, what I find most crucial in this question is that the quantity on the left hand side is meaningless. Parameters for different models only make sense within their own model. Hence when comparing models parameters cannot co-exist across models. What I suspect [without direct access to Kruschke’s Doing Bayesian Data Analysis book and as was later confirmed by John] is that he is using pseudo-priors in order to apply Carlin and Chib (1995) resolution [by saturation of the parameter space] of simulating over a trans-dimensional space…

## Why should I be Bayesian when my model is wrong?

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on May 9, 2017 by xi'an

Guillaume Dehaene posted the above question on X validated last Friday. Here is an except from it:

However, as everybody knows, assuming that my model is correct is fairly arrogant: why should Nature fall neatly inside the box of the models which I have considered? It is much more realistic to assume that the real model of the data p(x) differs from p(x|θ) for all values of θ. This is usually called a “misspecified” model.

My problem is that, in this more realistic misspecified case, I don’t have any good arguments for being Bayesian (i.e: computing the posterior distribution) versus simply computing the Maximum Likelihood Estimator.

Indeed, according to Kleijn, v.d Vaart (2012), in the misspecified case, the posterior distribution converges as nto a Dirac distribution centred at the MLE but does not have the correct variance (unless two values just happen to be same) in order to ensure that credible intervals of the posterior match confidence intervals for θ.

Which is a very interesting question…that may not have an answer (but that does not make it less interesting!)

A few thoughts about that meme that all models are wrong: (resonating from last week discussion):

1. While the hypothetical model is indeed almost invariably and irremediably wrong, it still makes sense to act in an efficient or coherent manner with respect to this model if this is the best one can do. The resulting inference produces an evaluation of the formal model that is the “closest” to the actual data generating model (if any);
2. There exist Bayesian approaches that can do without the model, a most recent example being the papers by Bissiri et al. (with my comments) and by Watson and Holmes (which I discussed with Judith Rousseau);
3. In a connected way, there exists a whole branch of Bayesian statistics dealing with M-open inference;
4. And yet another direction I like a lot is the SafeBayes approach of Peter Grünwald, who takes into account model misspecification to replace the likelihood with a down-graded version expressed as a power of the original likelihood.
5. The very recent Read Paper by Gelman and Hennig addresses this issue, albeit in a circumvoluted manner (and I added some comments on my blog).
6. In a sense, Bayesians should be the least concerned among statisticians and modellers about this aspect since the sampling model is to be taken as one of several prior assumptions and the outcome is conditional or relative to all those prior assumptions.

## an elegant result on exponential spacings

Posted in Statistics with tags , , , , , , , , , , , , , on April 19, 2017 by xi'an

A question on X validated I spotted in the train back from Lyon got me desperately seeking a reference in Devroye’s Generation Bible despite the abyssal wireless and a group of screeching urchins a few seats away from me… The question is about why

$\sum_{i=1}^{n}(Y_i - Y_{(1)}) \sim \text{Gamma}(n-1, 1)$

when the Y’s are standard exponentials. Since this reminded me immediately of exponential spacings, thanks to our Devroye fan-club reading group in Warwick,  I tried to download Devroye’s Chapter V and managed after a few aborts (and a significant increase in decibels from the family corner). The result by Sukhatme (1937) is in plain sight as Theorem 2.3 and is quite elegant as it relies on the fact that

$\sum_{i=1}^n y_i=\sum_{j=1}^n (n-j+1)(y_{(j)}-y_{(j-1)})=\sum_{j=2}^n (y_{(j)}-y_{(1)})$

hence sums up as a mere linear change of variables! (Pandurang Vasudeo Sukhatme (1911–1997) was an Indian statistician who worked on human nutrition and got the Guy Medal of the RSS in 1963.)

## how large is 9!!!!!!!!!?

Posted in Statistics with tags , , , , , , , , , on March 17, 2017 by xi'an

This may sound like an absurd question [and in some sense it is!], but this came out of a recent mathematical riddle on The Riddler, asking for the largest number one could write with ten symbols. The difficulty with this riddle is the definition of a symbol, as the collection of available symbols is a very relative concept. For instance, if one takes  the symbols available on a basic pocket calculator, besides the 10 digits and the decimal point, there should be the four basic operations plus square root and square,which means that presumably 999999999² is the largest one can  on a cell phone, there are already many more operations, for instance my phone includes the factorial operator and hence 9!!!!!!!!! is a good guess. While moving to a computer the problem becomes somewhat meaningless, both because there are very few software that handle infinite precision computing and hence very large numbers are not achievable without additional coding, and because it very much depends on the environment and on which numbers and symbols are already coded in the local language. As illustrated by this X validated answer, this link from The Riddler, and the xkcd entry below. (The solution provided by The Riddler itself is not particularly relevant as it relies on a particular collection of symbols, which mean Rado’s number BB(9999!) is also a solution within the right referential.)

## what does more efficient Monte Carlo mean?

Posted in Books, Kids, R, Statistics with tags , , , , , , on March 17, 2017 by xi'an

“I was just thinking that there might be a magic trick to simulate directly from this distribution without having to go for less efficient methods.”

In a simple question on X validated a few days ago [about simulating from x²φ(x)] popped up the remark that the person asking the question wanted a direct simulation method for higher efficiency. Compared with an accept-reject solution. Which shows a misunderstanding of what “efficiency” means on Monte Carlo situations. If it means anything, I would think it is reflected in the average time taken to return one simulation and possibly in the worst case. But there is no reason to call an inverse cdf method more efficient than an accept reject or a transform approach since it all depends on the time it takes to make the inversion compared with the other solutions… Since inverting the closed-form cdf in this example is much more expensive than generating a Gamma(½,½), and taking plus or minus its root, this is certainly the case here. Maybe a ziggurat method could be devised, especially since x²φ(x)<φ(x) when |x|≤1, but I am not sure it is worth the effort!

## multiplying a Gaussian matrix and a Gaussian vector

Posted in Books with tags , , , , , on March 2, 2017 by xi'an

This arXived note by Pierre-Alexandre Mattei was actually inspired by one of my blog entries, itself written from a resolution of a question on X validated. The original result about the Laplace distribution actually dates at least to 1932 and a paper by Wishart and Bartlett!I am not sure the construct has clear statistical implications, but it is nonetheless a good calculus exercise.

The note produces an extension to the multivariate case. Where the Laplace distribution is harder to define, in that multiple constructions are possible. The current paper opts for a definition based on the characteristic function. Which leads to a rather unsavoury density with Bessel functions. It however satisfies the constructive definition of being a multivariate Normal multiplied by a χ variate plus a constant vector multiplied by the same squared χ variate. It can also be derived as the distribution of

Wy+||y||²μ

when W is a (p,q) matrix with iid Gaussian columns and y is a Gaussian vector with independent components. And μ is a vector of the proper dimension. When μ=0 the marginals remain Laplace.