## email footprint

Posted in Travel, University life with tags , , , , , on September 14, 2019 by xi'an

While I was wondering (im Salzburg) at the carbon impact of sending emails with an endless cascade of the past history of exchanges and replies, I found this (rather rudimentary) assessment  that, while standard emails had an average impact of 4g, those with long attachments could cost 50g, quoting from Burners-Lee, leading to the fairly astounding figure of an evaluated impact of 1.6 kg a day or more than half a ton per year! Quite amazing when considering that a round flight Paris-Birmingham is producing 80kg. Hence justifying a posteriori my habit of removing earlier emails when replying to them. (It takes little effort to do so, especially in mailers where this feature can be set as the default option.)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , on December 5, 2017 by xi'an

I am a PhD student in biostatistics, and an avid reader of your work. I recently came across this blog post, where you review a text on statistical paradoxes, and I was struck by this section:

“For instance, the author considers the MLE being biased to be a paradox (p.117), while omitting the much more substantial “paradox” of the non-existence of unbiased estimators of most parameters—which simply means unbiasedness is irrelevant. Or the other even more puzzling “paradox” that the secondary MLE derived from the likelihood associated with the distribution of a primary MLE may differ from the primary. (My favourite!)”

I found this section provocative, but I am unclear on the nature of these “paradoxes”. I reviewed my stat inference notes and came across the classic example that there is no unbiased estimator for 1/p w.r.t. a binomial distribution, but I believe you are getting at a much more general result. If it’s not too much trouble, I would sincerely appreciate it if you could point me in the direction of a reference or provide a bit more detail for these two “paradoxes”.

The text is Chang’s Paradoxes in Scientific Inference, which I indeed reviewed negatively. To answer about the bias “paradox”, it is indeed a neglected fact that, while the average of any transform of a sample obviously is an unbiased estimator of its mean (!), the converse does not hold, namely, an arbitrary transform of the model parameter θ is not necessarily enjoying an unbiased estimator. In Lehmann and Casella, Chapter 2, Section 4, this issue is (just slightly) discussed. But essentially, transforms that lead to unbiased estimators are mostly the polynomial transforms of the mean parameters… (This also somewhat connects to a recent X validated question as to why MLEs are not always unbiased. Although the simplest explanation is that the transform of the MLE is the MLE of the transform!) In exponential families, I would deem the range of transforms with unbiased estimators closely related to the collection of functions that allow for inverse Laplace transforms, although I cannot quote a specific result on this hunch.

The other “paradox” is that, if h(X) is the MLE of the model parameter θ for the observable X, the distribution of h(X) has a density different from the density of X and, hence, its maximisation in the parameter θ may differ. An example (my favourite!) is the MLE of ||a||² based on x N(a,I) which is ||x||², a poor estimate, and which (strongly) differs from the MLE of ||a||² based on ||x||², which is close to (1-p/||x||²)²||x||² and (nearly) admissible [as discussed in the Bayesian Choice].

## can you help?

Posted in Statistics, University life with tags , , , , , , , on October 12, 2013 by xi'an

An email received a few days ago:

I want to compare the predictive power of a non Bayesian model (GWR, Geographically weighted regression) and a Bayesian hierarchical model (spLM).
For GWR, DIC is not defined, but AIC is.
For  spLM, AIC is not defined, but DIC is.

How can I compare the predictive ability of these two models? Does it make sense to compare AIC of one with DIC of the other?

I did not reply as the answer is in the question: the numerical values of AIC and DIC do not compare. And since one estimation is Bayesian while the other is not, I do not think the predictive abilities can be compared. This is not even mentioning my reluctance to use DIC…as renewed in yesterday’s post.

Posted in Books, Statistics, University life with tags , , on October 16, 2012 by xi'an

Here is an email I got yesterday

Voila ma question: Comment programmer avec le Matlab la fonction de densité a posteriori (n’est pas de type connu qui égale au produit la fonction de vraisemblance et la densité de la loi a priori) pour calculer la valeur de cette fonction en un point theta=x (theta est le paramètre a estimer) en fixant les autres paramètres.

Here is my question: How to program with Matlab the posterior density function (which is not of a well-known type and which equals the product of the likelihood function by the prior density) for calculating the value of this function at a point theta = x (theta is the parameter estimate) while keeping the other parameters fixed.

which is a bit naïve, especially the Matlab part… I answered that the programming issue was kind of straightforward when the computation of both the prior density function and the likelihood function was feasible. (With Matlab or any other language.)

## computational difficulties [with notations]

Posted in R, Statistics, University life with tags , , , , on August 25, 2011 by xi'an

Here is an email I received from Umberto:

I have a doubt regarding the tempered transitions method you considered in your JASA article with Celeux and Hurn.

On page 961 you detail the several steps for building a proposal for a given distribution by simulating through l tempered power densities. I am slightly confused regarding the interpretation of your MCMC(x,π) notation.

For example does $MCMC(y_l,\pi^{1/\beta_{l-1}})$ means that an MCMC procedure starting at yl, say Metropolis-Hastings, is used to generate a single proposal yl+1 for $\pi^{1/\beta_{l-1}}$ ?

If this is the case, then yl+1 might be rejected or accepted and in the former case I would have yl+1=yl right? In other words I am not required to simulate proposals using $MCMC(y_l,\pi^{1/\beta_{l-1}})$ until I finally accept yl+1.

By reading the last paragraph in page 962 it seems to me that, indeed, the y1,…,y2l-1 thus generated are not necessarily accepted proposals for the corresponding power densities.

In retrospect, I still like this MCMC(x,π) notation in the simulated tempering “up-and-down” scheme (and the paper!). Because it is generic, in the sense of an R function that would take the function MCMC(x,π) as its input. To clarify the notation in this light, MCMC(x,π) returns a value that is the outcome of the corresponding MCMC step. This value may be equal to x (MCMC rejection) or to another value (MCMC acceptance). So the sequence y1,…,y2l-1 is made of consecutive values that differ and of consecutive values that do not (it is even possible that all the terms in the sequence are equal). At the end of this “up-and-down” tempering, the value y2l-1 may be the next value of the Markov chain targeted at the original target π. Or the current value may be replicated. This depends on the overall acceptance probability (4) on page 961. (Following Neal, 1996, Statistics and Computing.) This is a very compelling idea, whose mileage may vary depending on the number of required steps and powers.