**W**hile attending my last session at MCqMC 2018, in Rennes, before taking a train back to Paris, I was confronted by this radical opinion upon our previous work with Matt Moores (Warwick) and other coauthors from QUT, where the speaker, Maksym Byshkin from Lugano, defended a new approach for maximum likelihood estimation using novel MCMC methods. Based on the point fixe equation characterising maximum likelihood estimators for exponential families, when theoretical and empirical moments of the natural statistic are equal. Using a Markov chain with stationary distribution the said exponential family, the fixed point equation can be turned into a zero divergence equation, requiring simulation of pseudo-data from the model, which depends on the unknown parameter. Breaking this circular argument, the authors note that simulating pseudo-data that reproduce the observed value of the sufficient statistic is enough. Which is related with Geyer and Thomson (1992) famous paper about Monte Carlo maximum likelihood estimation. From there I was and remain lost as I cannot see why a derivative of the expected divergence with respect to the parameter θ can be computed when this divergence is found by Monte Carlo rather than exhaustive enumeration. And later used in a stochastic gradient move on the parameter θ… Especially when the null divergence is imposed on the parameter. In any case, the final slide shows an application to a large image and an Ising model, solving the problem (?) in 140 seconds and suggesting indecency, when our much slower approach is intended to produce a complete posterior simulation in this context.

## Archive for inference

## indecent exposure

Posted in Statistics with tags ABC, Bayesian optimisation, Bretagne, Brittany, exponential families, image analysis, image processing, inference, Lugano, maximum likelihood estimation, MCqMC 2018, pre-processing, Rennes on July 27, 2018 by xi'an## X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags Bayesian Analysis, confidence intervals, credible intervals, Dennis Lindley, Harold Jeffreys, inference, Jerzy Neyman, maximum likelihood estimation, unbiasedness, University of Warwick, X-Outline on March 23, 2017 by xi'an**W**hile visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionallycontradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.

## Monte Carlo methods for Potts models

Posted in pictures, Statistics, University life with tags computation theory, importance sampling, inference, information theory, Potts model on March 10, 2016 by xi'an**T**here will be a seminar talk by Mehdi Molkaraie (Pompeu Fabra) next week at Institut Henri Poincaré (IHP), Paris, on his paper with Vincent Gomez.

We consider the problem of estimating the partition function of the ferromagnetic q-state Potts model. We propose an importance sampling algorithm in the dual of the normal factor graph representing the model. The algorithm can efficiently compute an estimate of the partition function when the coupling parameters of the model are strong (corresponding to models at low temperature) or when the model contains a mixture of strong and weak couplings. We show that, in this setting, the proposed algorithm significantly outperforms the state of the art methods.

The talk is at 14:30, March 17. It is part of a trimester program on information and computation theories I was completely unaware of.

## causality

Posted in Books, Statistics, University life with tags Annual Review of Statistics and Its Application, Bayes nets, Bayesian ideas, book review, causality, Causality in the Sciences, David Hume, inference, OUP, philosophy of sciences, repeatability of experiments on March 7, 2016 by xi'an**O**xford University Press sent me this book by Phyllis Illari and Frederica Russo, Causality (Philosophical theory meets scientific practice) a ~~little~~ while ago. (The book appeared in 2014.) Unless I asked for it, I cannot remember…

“The problem is whether and how to use information of general causation established in science to ascertain individual responsibility.” (p.38)

As the subtitle indicates, this is a philosophy book, not a statistics book. And not particularly intended for statisticians. Hence, I am not exactly qualified to analyse its contents, and even less to criticise its lack of connection with statistics. But this being a blog post… I read rather slowly through the book, which exposes a wide range (“a map”, p.8) of approaches and perspectives on the notions of causality, some ways to infer about causality, and the point of doing all this, concluding with a relativistic (and thus eminently philosophical) viewpoint defending a “pluralistic mosaic” or a “causal mosaic” that relates to all existing accounts of causality as they “each do something valuable” (p.258). From a naïve bystander perspective, this sounds like a new avatar of deconstructionism applied to causality.

“Simulations can be very illuminating about various phenomena that are complex and have unexpected effects (…) can be run repeatedly to study a system in different situations to those seen for the real system…” (p.15)

This is not to state that the book is uninteresting, as it provides a wide entry into philosophical attempts at categorising and defining causality, if not into the statistical aspects of the issue. (For instance, the problem whether or not causality can be proven uniquely from a statistical perspective is not mentioned.) Among those interesting points in the early chapters, a section (2.5) about simulation. Which however misses the depth of this earlier book on climate simulations I reviewed while in Monash. Or of the discussions at the interdisciplinary seminar last year in Hanover. I.J. Good’s probabilistic causality is mentioned but hardly detailed. (With the warning remark that one “should not confuse predictability with determinism [and] determinism with causality”, p.82.) Continue reading

## Principles of scientific methods [not a book review]

Posted in Books, pictures, Statistics, University life with tags book cover, book review, CHANCE, induction, inference, Mark Chang, paradoxes, scientific methods, swarm algorithms, textbook, wikipedia on November 11, 2014 by xi'an**M**ark Chang, author of *Paradoxes in Scientific Inference* and vice-president of AMAG Pharmaceuticals, has written another book entitled *Principles of Scientific Methods*. As was clear from my CHANCE review of *Paradoxes in Scientific Inference*, I did not find much appeal in this earlier book, even after the author wrote a reply (first posted on this blog and later printed in CHANCE). Hence a rather strong reluctance [of mine] to engage into another highly critical review when I received this new opus by the same author. *[And the brainwave cover just put me off even further, although I do not want to start a review by criticising the cover, it did not go that well with the previous attempts!]*

After going through *Principles of Scientific Methods*, I became ever more bemused about the reason(s) for writing or publishing such a book, to the point I decided not to write a CHANCE review on it… (But, having spent some Métro rides on it, I still want to discuss why. Read at your own peril!)

## from statistical evidence to evidence of causality

Posted in Books, Statistics with tags Bayesian inference, causality, confounders, counterfactuals, George Casella, inference, Ithaca, JASA, odds ratio, risk ratio on December 24, 2013 by xi'an**I** took the opportunity of having to wait at a local administration a long while today (!) to read an arXived paper by Dawid, Musio and Fienberg on the−both philosophical and practical−difficulty to establish the probabilities of the causes of effects. The first interesting thing about the paper is that it relates to the Médiator drug scandal that took place in France in the past year and still is under trial: thanks to the investigations of a local doctor, Irène Frachon, the drug was exposed as an aggravating factor for heart disease. Or maybe the cause. The case-control study of Frachon summarises into a 2×2 table with a corrected odds ratio of 17.1. From there, the authors expose the difficulties of drawing inference about causes of effects, i.e. causality, an aspect of inference that has always puzzled me. (And the paper led me to search for the distinction between odds ratio and risk ratio.)

“And the conceptual and implementational difficulties that we discuss below, that beset even the simplest case of inference about causes of effects, will be hugely magnified when we wish to take additional account of such policy considerations.”

**A** third interesting notion in the paper is the inclusion of counterfactuals. My introduction to counterfactuals dates back to a run in the back-country roads around Ithaca, New York, when George told me about a discussion paper from Phil he was editing for JASA on that notion with his philosopher neighbour Steven Schwartz as a discussant. (It was a great run, presumably in the late Spring. And the best introduction I could dream of!) Now, the paper starts from the counterfactual perspective to conclude that inference is close to impossible in this setting. Within my limited understanding, I would see that as a drawback of using counterfactuals, rather than of drawing inference about causes. If the corresponding statistical model is nonindentifiable, because one of the two responses is always missing, the model seems inappropriate. I am also surprised at the notion of “sufficiency” used in the paper, since it sounds like the background information cancels the need to account for the treatment (e.g., aspirin) decision. The fourth point is the derivation of bounds on the probabilities of causation, despite everything! Quite an interesting read thus!

## machine learning [book review]

Posted in Books, R, Statistics, University life with tags Bayesian statistics, clustering, data analysis, inference, machine learning, MAP estimators, MIT Press, statistics book on October 21, 2013 by xi'an**I** have to admit the rather embarrassing fact that *Machine Learning, A probabilistic perspective* by Kevin P. Murphy is the first machine learning book I really read in detail…! It is a massive book with close to 1,100 pages and I thus hesitated taking it with me around, until I grabbed it in my bag for Warwick. (And in the train to Argentan.) It is also massive in its contents as it covers most (all?) of what I call statistics (but visibly corresponds to machine learning as well!). With a Bayesian bent most of the time (which is the secret meaning of *probabilistic* in the title).

“…we define machine learning as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty (such as planning how to collect more data!).” (p.1)

**A**part from the Introduction—which I find rather confusing for not dwelling on the nature of errors and randomness and on the reason for using probabilistic models (since they are all wrong) and charming for including a picture of the author’s family as an illustration of face recognition algorithms—, I cannot say I found the book more lacking in foundations or in the breadth of methods and concepts it covers than a “standard” statistics book. In short, this is a perfectly acceptable statistics book! Furthermore, it has a very relevant and comprehensive selection of references (sometimes favouring “machine learning” references over “statistics” references!). Even the vocabulary seems pretty standard to me. All this makes me wonder why we at all distinguish between the two domains, following Larry Wasserman’s views (for once!) that the difference is mostly in the eye of the beholder, i.e. in which department one teaches… Which was already my perspective before I read the book but it comforted me even further. And the author agrees as well *(“The probabilistic approach to machine learning is closely related to the field of statistics, but differs slightly in terms of its emphasis and terminology”, p.1).* Let us all unite!

[..part 2 of the book review to appear tomorrow…]