Archive for textbook

practical Bayesian inference [book review]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , on April 26, 2018 by xi'an

[Disclaimer: I received this book of Coryn Bailer-Jones for a review in the International Statistical Review and intend to submit a revised version of this post as my review. As usual, book reviews on the ‘Og are reflecting my own definitely personal and highly subjective views on the topic!]

It is always a bit of a challenge to review introductory textbooks as, on the one hand, they are rarely written at the level and with the focus one would personally choose to write them. And, on the other hand, it is all too easy to find issues with the material presented and the way it is presented… So be warned and proceed cautiously! In the current case, Practical Bayesian Inference tries to embrace too much, methinks, by starting from basic probability notions (that should not be unknown to physical scientists, I believe, and which would avoid introducing a flat measure as a uniform distribution over the real line!, p.20). All the way to running MCMC for parameter estimation, to compare models by Bayesian evidence, and to cover non-parametric regression and bootstrap resampling. For instance, priors only make their apparition on page 71. With a puzzling choice of an improper prior (?) leading to an improper posterior (??), which is certainly not the smoothest entry on the topic. “Improper posteriors are a bad thing“, indeed! And using truncation to turn them into proper distributions is not a clear improvement as the truncation point will significantly impact the inference. Discussing about the choice of priors from the beginning has some appeal, but it may also create confusion in the novice reader (although one never knows!). Even asking about “what is a good prior?” (p.73) is not necessarily the best (and my recommended) approach to a proper understanding of the Bayesian paradigm. And arguing about the unicity of the prior (p.119) clashes with my own view of the prior being primarily a reference measure rather than an ideal summary of the available information. (The book argues at some point that there is no fixed model parameter, another and connected source of disagreement.) There is a section on assigning priors (p.113), but it only covers the case of a possibly biased coin without much realism. A feature common to many Bayesian textbooks though. To return to the issue of improper priors (and posteriors), the book includes several warnings about the danger of hitting an undefined posterior (still called a distribution), without providing real guidance on checking for its definition. (A tough question, to be sure.)

“One big drawback of the Metropolis algorithm is that it uses a fixed step size, the magnitude of which can hardly be determined in advance…”(p.165)

When introducing computational techniques, quadratic (or Laplace) approximation of the likelihood is mingled with kernel estimators, which does not seem appropriate. Proposing to check convergence and calibrate MCMC via ACF graphs is helpful in low dimensions, but not in larger dimensions. And while warning about the dangers of forgetting the Jacobians in the Metropolis-Hastings acceptance probability when using a transform like η=ln θ is well-taken, the loose handling of changes of variables may be more confusing than helpful (p.167). Discussing and providing two R codes for the (standard) Metropolis algorithm may prove too much. Or not. But using a four page R code for fitting a simple linear regression with a flat prior (pp.182-186) may definitely put the reader off! Even though I deem the example a proper experiment in setting a Metropolis algorithm and appreciate the detailed description around the R code itself. (I just take exception at the paragraph on running the code with two or even one observation, as the fact that “the Bayesian solution always exists” (p.188) [under a proper prior] is not necessarily convincing…)

“In the real world we cannot falsify a hypothesis or model any more than we “truthify” it (…) All we can do is ask which of the available models explains the data best.” (p.224)

In a similar format, the discussion on testing of hypotheses starts with a lengthy presentation of classical tests and p-values, the chapter ending up with a list of issues. Most of them reasonable in my own referential. I also concur with the conclusive remarks quoted above that what matters is a comparison of (all relatively false) models. What I less agree [as predictable from earlier posts and papers] with is the (standard) notion that comparing two models with a Bayes factor follows from the no information (in order to avoid the heavily loaded non-informative) prior weights of ½ and ½. Or similarly that the evidence is uniquely calibrated. Or, again, using a truncated improper prior under one of the assumptions (with the ghost of the Jeffreys-Lindley paradox lurking nearby…).  While the Savage-Dickey approximation is mentioned, the first numerical resolution of the approximation to the Bayes factor is via simulations from the priors. Which may be very poor in the situation of vague and uninformative priors. And then the deadly harmonic mean makes an entry (p.242), along with nested sampling… There is also a list of issues about Bayesian model comparison, including (strong) dependence on the prior, dependence on irrelevant alternatives, lack of goodness of fit tests, computational costs, including calls to possibly intractable likelihood function, ABC being then mentioned as a solution (which it is not, mostly).

Continue reading

my book available for a mere $1,091.50

Posted in Books with tags , , , on May 1, 2016 by xi'an

As I was looking at a link to my Bayesian Choice book on Amazon, I found that one site offered it for the modest sum of $1,091.50, a very slight increase when compared with the reference price of $59.95… I do wonder at the reason (scam?) behind this offer as such a large price is unlikely to attract any potential buyer to the site. (Obviously, if you are interested by this price, feel free to contact me!)

Statistics done wrong [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on March 16, 2015 by xi'an

no starch press (!) sent me the pdf version of this incoming book, Statistics done wrong, by Alex Reinhart, towards writing a book review for CHANCE, and I read it over two flights, one from Montpellier to Paris last week, and from Paris to B’ham this morning. The book is due to appear on March 16. It expands on a still existing website developed by Reinhart. (Discussed a year or so away on Andrew’s blog, most in comments, witness Andrew’s comment below.) Reinhart who is, incidentally or not, is a PhD candidate in statistics at Carnegie Mellon University. After apparently a rather consequent undergraduate foray into physics. Quite an unusual level of maturity and perspective for a PhD student..!

“It’s hard for me to evaluate because I am so close to the material. But on first glance it looks pretty reasonable to me.” A. Gelman

Overall, I found myself enjoying reading the book, even though I found the overall picture of the infinitely many mis-uses of statistics rather grim and a recipe for despairing of ever setting things straight..! Somehow, this is an anti-textbook, in that it warns about many ways of applying the right statistical technique in the wrong setting, without ever describing those statistical techniques. Actually without using a single maths equation. Which should be a reason good enough for me to let all hell break loose on that book! But, no, not really, I felt no compunction about agreeing with Reinhart’s warning and if you have reading Andrew’s blog for a while you should feel the same…

“Then again for a symptom like spontaneous human combustion you might get excited about any improvement.” A. Reinhart (p.13)

Maybe the limitation in the exercise is that statistics appears so much fraught with dangers of over-interpretation and false positive and that everyone (except physicists!) is bound to make such invalidated leaps in conclusion, willingly or not, that it sounds like the statistical side of Gödel’s impossibility theorem! Further, the book moves from recommendation at the individual level, i.e., on how one should conduct an experiment and separate data for hypothesis building from data for hypothesis testing, to a universal criticism of the poor standards of scientific publishing and the unavailability of most datasets and codes. Hence calling for universal reproducibility protocols that reminded of the directions explored in this recent book I reviewed on that topic. (The one the rogue bird did not like.) It may be missing on the bright side of things, for instance the wonderful possibility to use statistical models to produce simulated datasets that allow for an evaluation of the performances of a given procedure in the ideal setting. Which would have helped the increasingly depressed reader in finding ways of checking how wrongs things could get..! But also on the dark side, as it does not say much about the fact that a statistical model is most presumably wrong. (Maybe a physicist’s idiosyncrasy!) There is a chapter entitled Model Abuse, but all it does is criticise stepwise regression and somehow botches the description of Simpson’s paradox.

“You can likely get good advice in exchange for some chocolates or a beer or perhaps coauthorship on your next paper.” A. Reinhart (p.127)

The final pages are however quite redeeming in that they acknowledge that scientists from other fields cannot afford a solid enough training in statistics and hence should hire statisticians as consultants for the data collection, analysis and interpretation of their experiments. A most reasonable recommendation!

Principles of scientific methods [not a book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on November 11, 2014 by xi'an

Mark Chang, author of Paradoxes in Scientific Inference and vice-president of AMAG Pharmaceuticals, has written another book entitled Principles of Scientific Methods. As was clear from my CHANCE review of Paradoxes in Scientific Inference, I did not find much appeal in this earlier book, even after the author wrote a reply (first posted on this blog and later printed in CHANCE). Hence a rather strong reluctance [of mine] to engage into another highly critical review when I received this new opus by the same author. [And the brainwave cover just put me off even further, although I do not want to start a review by criticising the cover, it did not go that well with the previous attempts!]

After going through Principles of Scientific Methods, I became ever more bemused about the reason(s) for writing or publishing such a book, to the point I decided not to write a CHANCE review on it… (But, having spent some Métro rides on it, I still want to discuss why. Read at your own peril!)

Continue reading

machine learning [book review, part 2]

Posted in Books, R, Statistics, University life with tags , , , , , , , on October 22, 2013 by xi'an

The chapter (Chap. 3) on Bayesian updating or learning (a most appropriate term) for discrete data is well-done in Machine Learning, a probabilistic perspective if a bit stretched (which is easy with 1000 pages left!). I like the remark (Section 3.5.3) about the log-sum-exp trick. While lengthy, the chapter (Chap. 4) on Gaussian models has the appeal of introducing LDA. The true chapter about Bayesian statistics (Chap. 5) only comes later, which seems a wee bit late to me, but it mentions the paper by Druilhet and Marin (2007) about the dependence of the MAP estimator on the dominating measure. The Bayesian chapter covers the Bayesian interpretation of false discovery rates, And decision-theory (shared with the following chapter on frequentist statistics). This later chapter also covers the pathologies of p-values. The chapter on regression has a paragraph on the g-prior and its extensions (p.238). There are chapters on DAGs, mixture models, EM (which mentions the early MCEM of Celeux and Diebolt!), factor and principal component analyses, Gaussian processes, CART models, HMMs and state-space models, MFRs, variational Bayes, belief and expectation propagations,  and more… Most of the methods are implemented within a MATLAB package called PMTK (probabilistic modelling toolkit) that I did not check (because it is MATLAB!).

There are two (late!) chapters dedicated to simulation methods, Monte Carlo Inference (Chap. 23) and MCMC Inference (Chap.24). (I am somehow unhappy with the label Inference in those titles as those are simulation methods.) They cover the basics and more, including particle filters to some extent (but missing some of the most recent stuff, like Del Moral, Doucet & Jasra, 2006, or Andrieu, Doucet & Hollenstein, 2010). (When introducing the Metropolis-Hastings algorithm, the author states the condition that the union of the supports of the proposal should include the support of the target but this is a rather formal condition as the Markov chain may still fail to be irreducible in that case.) My overall feeling is that too much is introduced in too little space, potentially confusing the student. See, e.g., the half-page Section 24.3.7 (p.855) on reversible jump MCMC. Or the other half-page on Hamiltonian MCMC (p.868). An interesting entry is the study of the performances of the original Gibbs sampler of Geman & Geman (1984), which started the field (to some extent). It states that, unless the hyperparameters are extremely well-calibrated, the Gibbs sampler suggested therein fails to produce a useful segmentation algorithm! The section on convergence diagnoses is rather limited and referring to rather oldish methods, rather than suggesting a multiple-chain empirical exploratory approach. Similarly, there is only one page (p.872) of introduction to marginal likelihood approximation techniques, half of which is wasted on the harmonic mean “worst Monte Carlo method ever”. And the other half is spent on criticising Besag‘s candidate method exploited by Chib (1995).

Now, a wee bit more into detailed nitpicking (if only to feed the ‘Og!): first, the mathematical rigour is not always “au rendez-vous” and the handling of Dirac masses and conditionals and big-Oh (Exercise 3.20)( is too hand-waving for my taste (see p.39 for an example). I also dislike the notion of the multinoulli distribution (p.35), first because it is a poor pun on Bernoulli‘s name, second because sufficiency makes this distribution somewhat irrelevant when compared with the multinomial distribution. Although the book rather fairly covers the dangers and shortcomings of MAP estimators in Section (p.150), this remains the default solution. Monte Carlo is not “a city in Europe known for its plush gambling casinos” but the district of Monaco where the casino stands. And it writes Monte-Carlo in the original. The approximation of π by Monte Carlo is the one I used in my Aussie public lecture, but it would have been nice to know the number of iterations (p.54). The book unnecessarily and most vaguely refers to Taleb about the black swan paradox (p.77). The first introduction of Bayesian tests is to use the HPD  interval and check whether the null value is inside, with a prosecutor’s fallacy in conclusion (p.137). BIC then AIC are introduced (p.162) and the reader remains uncertain about which one to use. If any. Not! The fact that the MLE and the posterior mean differ (p.165) is not a sign of informativeness in the prior. The processing of the label switching problem for mixtures (p.841) is confusing in that the inference problem (invariance by permutation that prohibits using posterior means) is compounded by the simulation problem (failing to observe this behaviour in simulations). The Rao-Blackwellisation Theorem (p.841) does not apply to other cases than two-stage Gibbs sampling, but this is not clear from the text. The adaptive MCMC amcmc package of Jeff Rosenthal is not mentioned (because it is in R?). The proof of detailed balance (p.854-855) should take a line. Having so many references (35 pages) is both a bonus and a nuisance in a textbook, where students dislike the repeated occurrence of “see (so-&-so….”. I also dislike references being given with a parenthesis at all time, as in “See (Doucet et al. 2001) for details”.  And, definitely the least important remark!, the quotes at the beginning are not particularly novel or relevant: the book could do without them. (Same thing for the “no free lunch theorem” which is not particularly helpful as presented…)

In conclusion, Machine Learning, a probabilistic perspective offers a fairly wide, unifying, and comprehensive perspective on the field of statistics, aka machine learning, that can certainly be used as the textbook in a Master program where this is the only course of statistics, aka machine learning. (Having not read other machine learning books thoroughly, I cannot judge how innovative it is. The beginning is trying to build the intuition of what the book is about before introducing the models. Just not my way of proceeding but mostly a matter of taste and maybe of audience…) The computational aspects are not treated in enough depth for my taste and my courses, but there are excellent books on those aspects. The Bayesian thread sometimes run a wee bit thin, but remains a thread nonetheless throughout the book. Thus, a nice textbook for the appropriate course and a reference for many.

books versus papers [for PhD students]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on July 7, 2012 by xi'an

Before I run out of time, here is my answer to the ISBA Bulletin Students’ corner question of the term: “In terms of publications and from your own experience, what are the pros and cons of books vs journal articles?

While I started on my first book during my postdoctoral years in Purdue and Cornell [a basic probability book made out of class notes written with Arup Bose, which died against the breakers of some referees’ criticisms], my overall opinion on this is that books are never valued by hiring and promotion committees for what they are worth! It is a universal constant I met in the US, the UK and France alike that books are not helping much for promotion or hiring, at least at an early stage of one’s career. Later, books become a more acknowledge part of senior academics’ vitae. So, unless one has a PhD thesis that is ready to be turned into a readable book without having any impact on one’s publication list, and even if one has enough material and a broad enough message at one’s disposal, my advice is to go solely and persistently for journal articles. Besides the above mentioned attitude of recruiting and promotion committees, I believe this has several positive aspects: it forces the young researcher to maintain his/her focus on specialised topics in which she/he can achieve rapid prominence, rather than having to spend [quality research] time on replacing the background and building reference. It provides an evaluation by peers of the quality of her/his work, while reviews of books are generally on the light side. It is the starting point for building a network of collaborations, few people are interested in writing books with strangers (when knowing it is already quite a hardship with close friends!). It is also the entry to workshops and international conferences, where a new book very rarely attracts invitations.

Writing a book is of course exciting and somewhat more deeply rewarding, but it is awfully time-consuming and requires a high level of organization young faculty members rarely possess when starting a teaching job at a new university (with possibly family changes as well!). I was quite lucky when writing The Bayesian Choice and Monte Carlo Statistical Methods to mostly be on leave from teaching, as it would have otherwise be impossible! That we are not making sufficient progress on our revision of Bayesian Core, started two years ago, is a good enough proof that even with tight planning, great ideas, enthusiasm, sale prospects, and available material, completing a book may get into trouble for mere organisational issues…

Carnon [and Core, end]

Posted in Books, Kids, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on June 16, 2012 by xi'an

Yet another full day working on Bayesian Core with Jean-Michel in Carnon… This morning, I ran along the canal for about an hour and at last saw some pink flamingos close enough to take pictures (if only to convince my daughter that there were flamingos in the area!). Then I worked full-time on the spatial statistics chapter, using a small dataset on sedges that we found in Gaetan and Guyon’s Spatial Statistics and Modelling. I am almost done tonight, with both path sampling and ABC R codes documented and working for this dataset. But I’d like to re-run both codes for longer to achieve smoother outcomes.