**D**irk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled *Statistical Modeling and Computation*, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, *Statistical Modeling and Computation* is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our *Bayesian Essentials with R*, meaning that I am not the most appropriate or least ~~un~~biased reviewer for this book. Continue reading

## Archive for maximum likelihood estimation

## Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags ANU, Australia, Bayesian Essentials with R, Bayesian statistics, Brisbane, Dirk Kroese, introductory textbooks, Joshua Chan, Matlab, maximum likelihood estimation, Monte Carlo methods, Monte Carlo Statistical Methods, R, state space model on January 22, 2014 by xi'an## optimal estimation of parameters (book review)

Posted in Books, Statistics with tags Bayesian theory, coding, inference, maximum capacity estimator, maximum likelihood estimation, Rissanen, Shannonś information, statistical theory on September 12, 2013 by xi'an**As** I had read some of Jorma Rissanen’s papers in the early 1990’s when writing The Bayesian Choice, I was quite excited to learn that Rissanen had written a book on the *optimal estimation of parameters*, where he presents and develops his own approach to statistical inference (estimation *and* testing). As explained in the Preface this was induced by having to deliver the 2009 Shannon Lecture at the Information Theory Society conference.

“

Very few statisticians have been studying information theory, the result of which, I think, is the disarray of the present discipline of statistics.” J. Rissanen (p.2)

**N**ow that I have read the book (between Venezia in the peaceful and shaded Fundamenta Sacca San Girolamo and Hong Kong, so maybe in too a leisurely and off-handed manner), I am not so excited… It is not that the theory presented in *optimal estimation of parameters* is incomplete or ill-presented: the book is very well-written and well-designed, if in a highly personal (and borderline lone ranger) style. But the approach Rissanen advocates, namely maximum capacity as a generalisation of maximum likelihood, does not seem to relate to my statistical perspective and practice. Even though he takes great care to distance himself from Bayesian theory by repeating that the prior distribution is not necessary for his theory of *optimal estimation* (“priors are not needed in the general MDL principle”, p.4). my major source of incomprehension lies with the choice of incorporating the estimator within the data density to produce a new density, as in

Indeed, this leads to (a) replace a statistical model with a structure that mixes the model and the estimation procedure and (b) peak the new distribution by always choosing the most appropriate (local) value of the parameter. For a normal sample with unknown mean *θ*, this produces for instance to a joint normal distribution that is degenerate since

*(For a single observation it is not even defined.)* In a similar spirit, Rissanen defines this estimated model for dynamic data in a sequential manner, which means in the end that x_{1} is used n times, x_{2} n-1 times, and so on.., This asymmetry does not sound logical, especially when considering sufficiency.

“

…the misunderstanding that the more parameters there are in the model the better it is because it is closer to the `truth’ and the `truth’ obviously is not simple” J. Rissanen (p.38)

**A**nother point of contention with the approach advocated in *optimal estimation of parameters* is the inherent discretisation of the parameter space, which seems to exclude large dimensional spaces and complex models. I somehow subscribe to the idea that a given sample (hence a given sample size) induces a maximum precision in the estimation that can be translated into using a finite number of parameter values, but the implementation suggested in the book is essentially unidimensional. I also find the notion of optimality inherent to the statistical part of *optimal estimation of parameters *quite tautological as it ends up being a target that leads to the maximum likelihood estimator (or its pseudo-Bayesian counterpart).

“

The BIC criterion has neither information nor a probability theoretic interpretation, and it does not matter which measure for consistency is selected.” J. Rissanen (p.64)

**T**he first part of the book is about coding and information theory; it amounts in my understanding to a justification of the Kullback-Leibler divergence, with an early occurrence (p.27) of the above estimation distribution. (The *channel capacity* is the normalising constant of this weird density.)

“…

in hypothesis testing [where] the assumptions that the hypotheses are `true’ has misguided the entire field by generating problems which do not exist and distorting rational solutions to problems that do exist.” J. Rissanen (p.41)

I have issues with the definition of confidence intervals as they rely on an implicit choice of a measure and have a constant coverage that decreases with the parameter dimension. This notion also seem to clash with the subsequent discretisation of the parameter space. Hypothesis testing *à la* Rissanen reduces to an assessment of a goodness of fit, again with fixed coverage properties. Interestingly, the acceptance and rejection regions are based on two quantities, the likelihood ratio and the KL distance (p. 96), which leads to a delayed decision if they do not agree wrt fixed bounds.

“A drawback of the prediction formulas is that they require the knowledge of the ARMA parameters.”J. Rissanen (p.141)

**A** final chapter on sequential (or dynamic) models reminded me that Rissanen was at the core of inventing variable order Markov chains. The remainder of this chapter provides some properties of the sequential normalised maximum likelihood estimator advocated by the author in the same spirit as the earlier versions. The whole chapter feels (to me) somewhat disconnected from

**I**n conclusion, Rissanen’s book is a definitely interesting entry on a perplexing vision of statistics. While I do not think it will radically alter our understanding and practice of statistics, it is worth perusing, if only to appreciate there are still people (far?) out there attempting to bring a new vision of the field.

## A misleading title…

Posted in Books, R, Statistics, University life with tags bootstrap, fitting statistical distributions, generalized lambda distribution, GLDEX, International Statistical Review, John Tukey, Maple, Matlab, maximum likelihood estimation, quantile distribution, R on September 5, 2011 by xi'an**W**hen I received this book, ** Handbook of fitting statistical distributions with R**, by Z. Karian and E.J. Dudewicz, from/for the Short Book Reviews section of the

**, I was obviously impressed by its size (around 1700 pages and 3 kilos…). From briefly glancing at the table of contents, and the list of standard distributions appearing as subsections of the first chapters, I thought that the authors were covering different estimation/fitting techniques for most of the standard distributions. After taking a closer look at the book, I think the cover is misleading in several aspects: this is not a handbook (a.k.a. a reference book), it does not cover standard statistical distributions, the R input is marginal, and the authors only wrote part of the book, since about half of the chapters are written by other authors…**

*International Statistical Review*## Feedback on data cloning

Posted in Books, Statistics, Travel, University life with tags Banff, Biometrika, BIRS, data cloning, Ecology Letters, EM algorithm, Journal of Econometrics, maximum likelihood estimation, MCMC, Monte Carlo methods, prior feedback, SAME algorithm, simulated annealing, Statistics and Computing on September 22, 2010 by xi'an**F**ollowing some discussions I had last week at Banff about data cloning, I re-read the 2007 “Data cloning” paper published in ** Ecology Letters** by Lele, Dennis, and Lutscher. Once again, I see a strong similarity with our 2002

**SAME algorithm, as well as with the subsequent (and equally similar)**

*Statistics and Computing**“A multiple-imputation Metropolis version of the EM algorithm”*published in

**by Gaetan and Yao in 2003—**

*Biometrika***to which Arnaud and I had earlier and unsuccessfully submitted this unpublished technical report on the convergence of the SAME algorithm… (The SAME algorithm is also described in detail in the 2005 book**

*Biometrika**Inference in Hidden Markov Models*, Chapter 13.)

## Keynes’ derivations

Posted in Books, Statistics with tags A Treatise on Probability, Gauss, harmonic mean estimator, John Maynard Keynes, Laplace, maximum likelihood estimation on March 29, 2010 by xi'an**C**hapter XVII of Keynes’ * A Treatise On Probability* contains Keynes’ most noteworthy contribution to Statistics, namely the classification of probability distributions such that the arithmetic/geometric/harmonic empirical mean/empirical median is also the maximum likelihood estimator. This problem was first stated by Laplace and Gauss (leading to Laplace distribution in connection with the median and to the Gaussian distribution for the arithmetic mean). The derivation of the densities of those probability distributions is based on the constraint the likelihood equation

is satisfied for one of the four empirical estimate, using differential calculus (despite the fact that Keynes earlier derived Bayes’ theorem by assuming the parameter space to be discrete). Under regularity assumptions, in the case of the arithmetic mean, my colleague Eric Séré showed me this indeed leads to the family of distributions

where and are almost arbitrary functions under the constraints that is twice differentiable and is a density in . This means that satisfies

a constraint missed by Keynes.

**W**hile I cannot judge of the level of novelty in Keynes’ derivation with respect to earlier works, this derivation therefore produces a generic form of unidimensional exponential family, twenty-five years before their rederivation by Darmois (1935), Pitman (1936) and Koopman (1936) as characterising distributions with sufficient statistics of constant dimensions. The derivation of the distributions for which the geometric or the harmonic means are MLEs then follows by a change of variables, or , respectively. In those different derivations, the normalisation issue is treated quite off-handedly by Keynes, witness the function

at the bottom of page 198, which is not integrable in unless its support is bounded away from 0 or . Similarly, the derivation of the log-normal density on page 199 is missing the Jacobian factor (or in Keynes’ notations) and the same problem arises for the inverse-normal density, which should be

instead of (page 200). At last, I find the derivation of the distributions linked with the median rather dubious since Keynes’ general solution

(where the integral ought to be interpreted as a primitive) is such that the recovery of Laplace’s distribution, involves setting (page 201)

hence making a function of as well. The summary two pages later actually produces an alternative generic form, namely

with the difficulties that the distribution only vaguely depends on , being then a step function times and that, unless is properly calibrated, also depends on .

**G**iven that this part is the most technical section of the book, this post shows why I am fairly disappointed at having picked this book for my reading seminar. There is no further section with innovative methodological substance in the remainder of the book, which now appears to me as no better than a graduate dissertation on the probabilistic and statistical literature of the (not that) late 19th century, modulo the (inappropriate) highly critical tone.