## efficiency and the Fréchet-Darmois-Cramèr-Rao bound

Posted in Books, Kids, Statistics with tags , , , , , , , , , , , on February 4, 2019 by xi'an  Following some entries on X validated, and after grading a mathematical statistics exam involving Cramèr-Rao, or Fréchet-Darmois-Cramèr-Rao to include both French contributors pictured above, I wonder as usual at the relevance of a concept of efficiency outside [and even inside] the restricted case of unbiased estimators. The general (frequentist) version is that the variance of an estimator δ of [any transform of] θ with bias b(θ) is

I(θ)⁻¹ (1+b'(θ))²

while a Bayesian version is the van Trees inequality on the integrated squared error loss

(E(I(θ))+I(π))⁻¹

where I(θ) and I(π) are the Fisher information and the prior entropy, respectively. But this opens a whole can of worms, in my opinion since

• establishing that a given estimator is efficient requires computing both the bias and the variance of that estimator, not an easy task when considering a Bayes estimator or even the James-Stein estimator. I actually do not know if any of the estimators dominating the standard Normal mean estimator has been shown to be efficient (although there exist results for closed form expressions of the James-Stein estimator quadratic risk, including one of mine the Canadian Journal of Statistics published verbatim in 1988). Or is there a result that a Bayes estimator associated with the quadratic loss is by default efficient in either the first or second sense?
• while the initial Fréchet-Darmois-Cramèr-Rao bound is restricted to unbiased estimators (i.e., b(θ)≡0) and unable to produce efficient estimators in all settings but for the natural parameter in the setting of exponential families, moving to the general case means there exists one efficiency notion for every bias function b(θ), which makes the notion quite weak, while not necessarily producing efficient estimators anyway, the major impediment to taking this notion seriously;
• moving from the variance to the squared error loss is not more “natural” than using any [other] convex combination of variance and squared bias, creating a whole new class of optimalities (a grocery of cans of worms!);
• I never got into the van Trees inequality so cannot say much, except that the comparison between various priors is delicate since the integrated risks are against different parameter measures.

## what does more efficient Monte Carlo mean?

Posted in Books, Kids, R, Statistics with tags , , , , , , on March 17, 2017 by xi'an “I was just thinking that there might be a magic trick to simulate directly from this distribution without having to go for less efficient methods.”

In a simple question on X validated a few days ago [about simulating from x²φ(x)] popped up the remark that the person asking the question wanted a direct simulation method for higher efficiency. Compared with an accept-reject solution. Which shows a misunderstanding of what “efficiency” means on Monte Carlo situations. If it means anything, I would think it is reflected in the average time taken to return one simulation and possibly in the worst case. But there is no reason to call an inverse cdf method more efficient than an accept reject or a transform approach since it all depends on the time it takes to make the inversion compared with the other solutions… Since inverting the closed-form cdf in this example is much more expensive than generating a Gamma(½,½), and taking plus or minus its root, this is certainly the case here. Maybe a ziggurat method could be devised, especially since x²φ(x)<φ(x) when |x|≤1, but I am not sure it is worth the effort!

## analysing statistical and computational trade-off of estimation procedures

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 8, 2015 by xi'an “The collection of estimates may be determined by questions such as: How much storage is available? Can all the data be kept in memory or only a subset? How much processing power is available? Are there parallel or distributed systems that can be exploited?”

Daniel Sussman, Alexander Volfovsky, and Edoardo Airoldi from Harvard wrote a very interesting paper about setting a balance between statistical efficiency and computational efficiency, a theme that resonates with our recent work on ABC and older considerations about the efficiency of Monte Carlo algorithms. While the paper avoids drifting towards computer science even with a notion like algorithmic complexity, I like the introduction of a loss function in the comparison game, even though the way to combine both dimensions is unclear. And may limit the exercise to an intellectual game. In an ideal setting one would set the computational time, like “I have one hour to get this estimate”, and compare risks under that that computing constraint. Possibly dumping some observations from the sample to satisfy the constraint. Ideally. Which is why this also reminds me of ABC: given an intractable likelihood, one starts by throwing away some data precision by using a tolerance ε and usually more through an insufficient statistic. Hence ABC procedures could also be compared in such terms.

In the current paper, the authors only compare schemes of breaking the sample into bits to handle each observation only once. Meaning it cannot be used in both the empirical mean and the empirical variance. This sounds a bit contrived in that the optimum allocation depends on the value of the parameter the procedure attempts to estimate. Still, it could lead to a new form of bandit problems: given a bandit with as many arms as there are parameters, at each new observation, decide on the allocation towards minimising the overall risk. (There is a missing sentence at the end of Section 4.)

Any direction for turning those considerations into a practical decision machine would be fantastic, although the difficulties are formidable, from deciding between estimators and selecting a class of estimators, to computing costs and risks depending on unknown parameters.