## EntropyMCMC [R package]

Posted in Statistics with tags , , , , , , , , , , , , on March 26, 2019 by xi'an

My colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. [Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]

## let the evidence speak [book review]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , on December 17, 2018 by xi'an

This book by Alan Jessop, professor at the Durham University Business School,  aims at presenting Bayesian ideas and methods towards decision making “without formula because they are not necessary; the ability to add and multiply is all that is needed.” The trick is in using a Bayes grid, in other words a two by two table. (There are a few formulas that survived the slaughter, see e.g. on p. 91 the formula for the entropy. Contained in the chapter on information that I find definitely unclear.) When leaving the 2×2 world, things become more complicated and the construction of a prior belief as a probability density gets heroic without the availability of maths formulas. The first part of the paper is about Likelihood, albeit not the likelihood function, despite having the general rule that (p.73)

belief is proportional to base rate x likelihood

which is the book‘s version of Bayes’ (base?!) theorem. It then goes on to discuss the less structure nature of prior (or prior beliefs) against likelihood by describing Tony O’Hagan’s way of scaling experts’ beliefs in terms of a Beta distribution. And mentioning Jaynes’ maximum entropy prior without a single formula. What is hard to fathom from the text is how can one derive the likelihood outside surveys. (Using the illustration of 1963 Oswald’s murder by Ruby in the likelihood chapter does not particularly help!) A bit of nitpicking at this stage: the sentence

“The ancient Greeks, and before them the Chinese and the Aztecs…”

is historically incorrect since, while the Chinese empire dates back before the Greek dark ages, the Aztecs only rule Mexico from the 14th century (AD) until the Spaniard invasion. While most of the book sticks with unidimensional parameters, it also discusses more complex structures, for which it relies on Monte Carlo, although the description is rather cryptic (use your spreadsheet!, p.133). The book at this stage turns into a more story-telling mode, by considering for instance the Federalist papers analysis by Mosteller and Wallace. The reader can only follow the process of assessing a document authorship for a single word, as multidimensional cases (for either data or parameters) are out of reach. The same comment applies to the ecology, archeology, and psychology chapters that follow. The intermediary chapter on the “grossly misleading” [Court wording] of the statistical evidence in the Sally Clark prosecution is more accessible in that (again) it relies on a single number. Returning to the ban of Bayes rule in British courts:

In the light of the strong criticism by this court in the 1990s of using Bayes theorem before the jury in cases where there was no reliable statistical evidence, the practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice.

the discussion found in the book is quite moderate and inclusive, in that a Bayesian analysis helps in gathering evidence about a case, but may be misunderstood or misused at the [non-Bayesian] decision level.

In conclusion, Let the Evidence Speak is an interesting introduction to Bayesian thinking, through a simplifying device, the Bayes grid, which seems to come from management, with a large number of examples, if not necessarily all realistic and some side-stories. I doubt this exposure can produce expert practitioners, but it makes for an worthwhile awakening for someone “likely to have read this book because [one] had heard of Bayes but were uncertain what is was” (p.222). With commendable caution and warnings along the way.

## evaluating stochastic algorithms

Posted in Books, R, Statistics, University life with tags , , , , , , , , on February 20, 2014 by xi'an

Reinaldo sent me this email a long while ago

```Could you recommend me a nice reference about
measures to evaluate stochastic algorithms (in
particular focus in approximating posterior
distributions).
```

and I hope he is still reading the ‘Og, despite my lack of prompt reply! I procrastinated and procrastinated in answering this question as I did not have a ready reply… We have indeed seen (almost suffered from!) a flow of MCMC convergence diagnostics in the 90’s.  And then it dried out. Maybe because of the impossibility to be “really” sure, unless running one’s MCMC much longer than “necessary to reach” stationarity and convergence. The heat of the dispute between the “single chain school” of Geyer (1992, Statistical Science) and the “multiple chain school” of Gelman and Rubin (1992, Statistical Science) has since long evaporated. My feeling is that people (still) run their MCMC samplers several times and check for coherence between the outcomes. Possibly using different kernels on parallel threads. At best, but rarely, they run (one or another form of) tempering to identify the modal zones of the target. And instances where non-trivial control variates are available are fairly rare. Hence, a non-sequitur reply at the MCMC level. As there is no automated tool available, in my opinion. (Even though I did not check the latest versions of BUGS.)

As it happened, Didier Chauveau from Orléans gave today a talk at Big’MC on convergence assessment based on entropy estimation, a joint work with Pierre Vandekerkhove. He mentioned SamplerCompare which is an R package that appeared in 2010. Soon to come is their own EntropyMCMC package, using parallel simulation. And k-nearest neighbour estimation.

If I re-interpret the question as focussed on ABC algorithms, it gets both more delicate and easier. Easy because each ABC distribution is different. So there is no reason to look at the unreachable original target. Delicate because there are several parameters to calibrate (tolerance, choice of summary, …) on top of the number of MCMC simulations. In DIYABC, the outcome is always made of the superposition of several runs to check for stability (or lack thereof). But this tells us nothing about the distance to the true original target. The obvious but impractical answer is to use some basic bootstrapping, as it is generally much too costly.

## the most human human

Posted in Books, University life with tags , , , , , , , , , , , , , , on May 24, 2013 by xi'an

“…the story of Homo sapiens trying to stake a claim on shifting ground, flanked on both sides by beast and machine, pinned between meat and math.” (p.13)

No typo in the title, this is truly how this book by Brian Christian is called. It was kindly sent to me by my friends from BUY and I realised I could still write with my right hand when commenting on the margin. (I also found the most marvellous proof to a major theorem but the margin was just too small…)  “The most human human: What artificial intelligence teaches us about being alive” is about the Turing test, designed to test whether an unknown interlocutor is a human or a machine. And eventually doomed to fail.

“The final test, for me, was to give the most uniquely human performance I could in Brighton, to attempt a successful defense against the machines.” (p.15)

What I had not realised earlier is that there is a competition every year running this test against a few AIs and a small group of humans, the judges (blindly) giving votes for each entity and selecting as a result the most human computer. And also the most human … human! This competition is called the Loebner Prize and it was taking place in Brighton, this most English of English seaside towns, in 2008 when Brian Christian took part in it (as a human, obviously!).

“Though both [sides] have made progress, the `algorithmic’ side of the field [of computer science] has, from Turing on, completely dominated the more `statistical’ side. That is, until recently.” (p.65)

I enjoyed the book, much more for the questions it brought out than for the answers it proposed, as the latter sounded unnecessarily conflictual to me, i.e. adopting a “us vs.’em” posture and whining about humanity not fighting hard enough to keep ahead of AIs… I dislike this idea of the AIs being the ennemy and of “humanity lost” the year AIs would fool the judges. While I enjoy the sci’ fi’ literature where this antagonism is exacerbated, from Blade Runner to Hyperion, to Neuromancer, I do not extrapolate those fantasised settings to the real world. For one thing, AIs are designed by humans, so having them winning this test (or winning against chess grand-masters) is a celebration of the human spirit, not a defeat! For another thing, we are talking about a fairly limited aspect of “humanity”, namely the ability to sustain a limited discussion with a set of judges on a restricted number of topics. I would be more worried if a humanoid robot managed to fool me by chatting with me for a whole transatlantic flight. For yet another thing, I do not see how this could reflect on the human race as a whole and indicate that it is regressing in any way. At most, it shows the judges were not trying hard enough (the questions reported in The most human human were not that exciting!) and maybe the human competitors had not intended to be perceived as humans.

“Does this suggest, I wonder, that entropy may be fractal?” (p.239)

Another issue that irked me in the author’s perspective is that he trained and elaborated a complex strategy to win the prize (sorry for the mini-spoiler: in case you did  not know, Brian did finish as the most human human). I do not know if this worry fear to appear less human than an AI was genuine or if it provided a convenient canvas for writing the book around the philosophical question of what makes us human(s). But it mostly highlights the artificial nature of the test, namely that one has to think in advance on the way conversations will be conducted, rather than engage into a genuine conversation with a stranger. This deserves the least human human label, in retrospect!

“So even if you’ve never heard of [Shanon entropy] beofre, something in your head intuits [it] every time you open your mouth.” (p.232)

The book spend a large amount of text/time on the victory of Deep Blue over Gary Kasparov (or, rather, on the defeat of Kasparov against Deep Blue), bemoaning the fact as the end of a golden age. I do not see the problem (and preferred the approach of Nate Silver‘s). The design of the Deep Blue software was a monument to the human mind, the victory did not diminish Kasparov who remains one of the greatest chess players ever, and I am not aware it changed chess playing (except when some players started cheating with the help of hidden computers!). The fact that players started learning more and more chess openings was a trend much before this competition. As noted in The most human human,  checkers had to change its rules once a complete analysis of the game had led to  a status-quo in the games. And this was before the computer era. In Glasgow, Scotland, in 1863. Just to draw another comparison: I like playing Sudoku and the fact that I designed a poor R code to solve Sudokus does not prevent me from playing, while my playing sometimes leads to improving the R code. The game of go could have been mentioned as well, since it proves harder to solve by AIs. But there is no reason this should not happen in a more or less near future…

“…we are ordering appetizers and saying something about Wikipedia, something about Thomas  Bayes, something about vegetarian dining…” (p.266)

While the author produces an interesting range of arguments about language, intelligence, humanity, he missed a part about the statistical modelling of languages, apart from a very brief mention of a Markov dependence. Which would have related to the AIs perspective. The overall flow is nice but somehow meandering and lacking in substance. Esp. in the last chapters. On a minor level, I also find that there are too many quotes from Hofstadter’ Gödel, Escher and Bach, as well as references to pop culture. I was surprised to find Thomas Bayes mentioned in the above quote, as it did not appear earlier, except in a back-note.

“A girl on the stairs listen to her father / Beat up her mother” C.D. Wright,  Tours

As a side note to Andrew, there was no mention made of Alan Turing’s chess rules in the book, even though both Turing and chess were central themes. I actually wondered if a Turing test could apply to AIs playing Turing’s chess: they would have to be carried by a small enough computer so that the robot could run around the house in a reasonable time. (I do not think chess-boxing should be considered in this case!)

Posted in Statistics, University life with tags , , , , on March 1, 2011 by xi'an

My (PhD level) reading seminar at CREST this year will be about some chapters of Jaynes’ Probability Theory. As announced earlier. The dates of the course are set as March 21 (11am), 24, 28, 31 and April 04 (2pm) at ENSAE (Malakoff, Salle 19). Attendance is free and everyone’s more than welcome, but registration is compulsory. The seminar is most effective when the audience has read the book chapters prior to the lecture, as it can engage into a higher debate! Several copies [10] of the book are available in the school library. (There was a version on-line at some point but it apparently got removed.) Here is the text of the announcement for the course next month:

Jeffreys and Jaynes share a lot in common as physicists who both significantly contributed to Bayesian statistical theory and as writers of books with almost identical titles and with very ambitious and similar scopes. It is thus no surprise that Jaynes dedicates his book to Jeffreys. There are also differences, the most obvious one being that Jeffreys published his foundational book before his 50th birthday, while Jaynes’ book came out more than ten years after his death (under the scholarly supervision of Larry Brethorst).
We plan to cover in the lectures what we consider to be the most significant aspects of Jaynes’s work. The corpus of work corresponding to the logical foundations of probability theory and the opposition of Jaynes to (Feller’s) measure theory, Bourbakism, Kolmogorov’s axioms, (Feller’s) countable additivity, de Finetti’s principles, and other probabilistic paradoxes will not be adressed, even though a second course by a probabilist colleague of mine at Dauphine may follow this one. The lectures will focus on

1. the definition and motivation of prior distributions (Chapter 6), culminating in the definition of the entropy principle (Chapter 11);
2. the rules of hypothesis testing (Chapter 4) and the central role of evidence (Chapters 9 and 18);
3. the special case of transformation groups (Chapter 12) and the debate about marginalisation paradoxes (Chapter 15)
4. Bayesian estimation (Chapter 6) and the criticisms on decision theory (Chapters 13 and 14)
5. Model comparison (Chapter 20) and the pathologies of orthodox methods (Chapters 16 and 17)

## Random sudokus [p-values]

Posted in R, Statistics with tags , , , , , , , on May 21, 2010 by xi'an

I reran the program checking the distribution of the digits over 9 “diagonals” (obtained by acceptable permutations of rows and column) and this test again results in mostly small p-values. Over a million iterations, and the nine (dependent) diagonals, four p-values were below 0.01, three were below 0.1, and two were above (0.21 and 0.42). So I conclude in a discrepancy between my (full) sudoku generator and the hypothesised distribution of the (number of different) digits over the diagonal. Assuming my generator is a faithful reproduction of the one used in the paper by Newton and DeSalvo, this discrepancy suggests that their distribution over the sudoku grids do not agree with this diagonal distribution, either because it is actually different from uniform or, more likely, because the uniform distribution I use over the (groups of three over the) diagonal is not compatible with a uniform distribution over all sudokus…

## Random [uniform?] sudokus

Posted in R, Statistics with tags , , , , , , on May 19, 2010 by xi'an

A longer run of the R code of yesterday with a million sudokus produced the following qqplot.

It does look ok but no perfect. Actually, it looks very much like the graph of yesterday, although based on a 100-fold increase in the number of simulations. Now, if I test the adequation with a basic chi-square test (!), the result is highly negative:

> chisq.test(obs,p=pdiag/sum(pdiag)) #numerical error in pdiag
Chi-squared test for given probabilities
data:  obs
X-squared = 6978.503, df = 6, p-value < 2.2e-16

(there are seven entries for both obs and pdiag, hence the six degrees of freedom). So this casts a doubt upon the uniformity of the random generator suggested in the paper by Newton and DeSalvo or rather on my programming abilities, see next post!