Archive for University of Warwick

X-Outline of a Theory of Statistical Estimation

Posted in Books, Statistics, University life with tags , , , , , , , , , on March 23, 2017 by xi'an

While visiting Warwick last week, Jean-Michel Marin pointed out and forwarded me this remarkable paper of Jerzy Neyman, published in 1937, and presented to the Royal Society by Harold Jeffreys.

“Leaving apart on one side the practical difficulty of achieving randomness and the meaning of this word when applied to actual experiments…”

“It may be useful to point out that although we are frequently witnessing controversies in which authors try to defend one or another system of the theory of probability as the only legitimate, I am of the opinion that several such theories may be and actually are legitimate, in spite of their occasionally contradicting one another. Each of these theories is based on some system of postulates, and so long as the postulates forming one particular system do not contradict each other and are sufficient to construct a theory, this is as legitimate as any other. “

This paper is fairly long in part because Neyman starts by setting Kolmogorov’s axioms of probability. This is of historical interest but also needed for Neyman to oppose his notion of probability to Jeffreys’ (which is the same from a formal perspective, I believe!). He actually spends a fair chunk on explaining why constants cannot have anything but trivial probability measures. Getting ready to state that an a priori distribution has no meaning (p.343) and that in the rare cases it does it is mostly unknown. While reading the paper, I thought that the distinction was more in terms of frequentist or conditional properties of the estimators, Neyman’s arguments paving the way to his definition of a confidence interval. Assuming repeatability of the experiment under the same conditions and therefore same parameter value (p.344).

“The advantage of the unbiassed [sic] estimates and the justification of their use lies in the fact that in cases frequently met the probability of their differing very much from the estimated parameters is small.”

“…the maximum likelihood estimates appear to be what could be called the best “almost unbiassed [sic]” estimates.”

It is also quite interesting to read that the principle for insisting on unbiasedness is one of producing small errors, because this is not that often the case, as shown by the complete class theorems of Wald (ten years later). And that maximum likelihood is somewhat relegated to a secondary rank, almost unbiased being understood as consistent. A most amusing part of the paper is when Neyman inverts the credible set into a confidence set, that is, turning what is random in a constant and vice-versa. With a justification that the credible interval has zero or one coverage, while the confidence interval has a long-run validity of returning the correct rate of success. What is equally amusing is that the boundaries of a credible interval turn into functions of the sample, hence could be evaluated on a frequentist basis, as done later by Dennis Lindley and others like Welch and Peers, but that Neyman fails to see this and turn the bounds into hard values. For a given sample.

“This, however, is not always the case, and in general there are two or more systems of confidence intervals possible corresponding to the same confidence coefficient α, such that for certain sample points, E’, the intervals in one system are shorter than those in the other, while for some other sample points, E”, the reverse is true.”

The resulting construction of a confidence interval is then awfully convoluted when compared with the derivation of an HPD region, going through regions of acceptance that are the dual of a confidence interval (in the sampling space), while apparently [from my hasty read] missing a rule to order them. And rejecting the notion of a confidence interval being possibly empty, which, while being of practical interest, clashes with its frequentist backup.

truth or truthiness [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , , on March 21, 2017 by xi'an

This 2016 book by Howard Wainer has been sitting (!) on my desk for quite a while and it took a long visit to Warwick to find a free spot to quickly read it and write my impressions. The subtitle is, as shown on the picture, “Distinguishing fact from fiction by learning to think like a data scientist”. With all due respect to the book, which illustrates quite pleasantly the dangers of (pseudo-)data mis- or over- (or eve under-)interpretation, and to the author, who has repeatedly emphasised those points in his books and tribunes opinion columns, including those in CHANCE, I do not think the book teaches how to think like a data scientist. In that an arbitrary neophyte reader would not manage to handle a realistic data centric situation without deeper training. But this collection of essays, some of which were tribunes, makes for a nice reading  nonetheless.

I presume that in this post-truth and alternative facts [dark] era, the notion of truthiness is familiar to most readers! It is often based on a misunderstanding or a misappropriation of data leading to dubious and unfounded conclusions. The book runs through dozens of examples (some of them quite short and mostly appealing to common sense) to show how this happens and to some extent how this can be countered. If not avoided as people will always try to bend, willingly or not, the data to their conclusion.

There are several parts and several themes in Truth or Truthiness, with different degrees of depth and novelty. The more involved part is in my opinion the one about causality, with illustrations in educational testing, psychology, and medical trials. (The illustration about fracking and the resulting impact on Oklahoma earthquakes should not be in the book, except that there exist officials publicly denying the facts. The same remark applies to the testing cheat controversy, which would be laughable had not someone ended up the victim!) The section on graphical representation and data communication is less exciting, presumably because it comes after Tufte’s books and message. I also feel the 1854 cholera map of John Snow is somewhat over-exploited, since he only drew the map after the epidemic declined.  The final chapter Don’t Try this at Home is quite anecdotal and at the same time this may the whole point, namely that in mundane questions thinking like a data scientist is feasible and leads to sometimes surprising conclusions!

“In the past a theory could get by on its beauty; in the modern world, a successful theory has to work for a living.” (p.40)

The book reads quite nicely, as a whole and a collection of pieces, from which class and talk illustrations can be borrowed. I like the “learned” tone of it, with plenty of citations and witticisms, some in Latin, Yiddish and even French. (Even though the later is somewhat inaccurate! Si ça avait pu se produire, ça avait dû se produire [p.152] would have sounded more vernacular in my Gallic opinion!) I thus enjoyed unreservedly Truth or Truthiness, for its rich style and critical message, all the more needed in the current times, and far from comparing it with a bag of potato chips as Andrew Gelman did, I would like to stress its classical tone, in the sense of being immersed in a broad and deep culture that seems to be receding fast.

random forests [reading group]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on March 14, 2017 by xi'an

Here are the slides I prepared (and recycled) over the weekend for the reading group on machine-learning that recently started in Warwick. Where I am for two consecutive weeks.

poverty of medieval students

Posted in Books, Kids, pictures, Travel, University life with tags , , , , on March 11, 2017 by xi'an

enclosure of the "new" court, St John's College, Cambridge, Jan. 27, 2012While waiting for a new staff card in the Human Resources building at the University of Warwick, I browsed through a THE issue and came upon this rather bizarre article by Jack Grove, reporting on a scholarly paper on the tuition and living fees of medieval students, i.e. around the 14th and 15th centuries in Britain, France, or Italy [which did not exist at the time]. Bizarre in that it seemed obvious to me that education in the Middle Ages was severely restricted to a tiny margin of the society…

back in Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on January 30, 2017 by xi'an

As in the previous years, I am back in Oxford (England) for my short Bayesian Statistics course in the joint Oxford-Warwick PhD programme, OxWaSP.  For some unclear reason, presumably related to the Internet connection from Oxford, I have not been able to upload my slides to Slideshare, so here the [99.9% identical] older version:

anytime algorithm

Posted in Books, Statistics with tags , , , , , , , , , on January 11, 2017 by xi'an

Lawrence Murray, Sumeet Singh, Pierre Jacob, and Anthony Lee (Warwick) recently arXived a paper on Anytime Monte Carlo. (The earlier post on this topic is no coincidence, as Lawrence had told me about this problem when he visited Paris last Spring. Including a forced extension when his passport got stolen.) The difficulty with anytime algorithms for MCMC is the lack of exchangeability of the MCMC sequence (except for formal settings where regeneration can be used).

When accounting for duration of computation between steps of an MCMC generation, the Markov chain turns into a Markov jump process, whose stationary distribution α is biased by the average delivery time. Unless it is constant. The authors manage this difficulty by interlocking the original chain with a secondary chain so that even- and odd-index chains are independent. The secondary chain is then discarded. This provides a way to run an anytime MCMC. The principle can be extended to K+1 chains, run one after the other, since only one of those chains need be discarded. It also applies to SMC and SMC². The appeal of anytime simulation in this particle setting is that resampling is no longer a bottleneck. Hence easily distributed among processors. One aspect I do not fully understand is how the computing budget is handled, since allocating the same real time to each iteration of SMC seems to envision each target in the sequence as requiring the same amount of time. (An interesting side remark made in this paper is the lack of exchangeability resulting from elaborate resampling mechanisms, lack I had not thought of before.)

zig, zag, and subsampling

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 29, 2016 by xi'an

ENSAE, Nov. 17, 2010Today, I alas missed a seminar at BiPS on the Zig-Zag (sub-)sampler of Joris Bierkens, Paul Fearnhead and Gareth Roberts, presented here in Paris by James Ridgway. Fortunately for me, I had some discussions with Murray Pollock in Warwick and then again with Changye Wu in Dauphine that shed some light on this complex but highly innovative approach to simulating in Big Data settings thanks to a correct subsampling mechanism.

The zig-zag process runs a continuous process made of segments that turn from one diagonal to the next at random times driven by a generator connected with the components of the gradient of the target log-density. Plus a symmetric term. Provided those random times can be generated, this process is truly available and associated with the right target distribution. When the components of the parameter are independent (an unlikely setting), those random times can be associated with an inhomogeneous Poisson process. In the general case, one needs to bound the gradients by more manageable functions that create a Poisson process that can later be thinned. Next, one needs to simulate the process for the upper bound, a task that seems hard to achieve apart from linear and piecewise constant upper bounds. The process has a bit of a slice sampling taste, except that it cannot be used as a slice sampler but requires continuous time integration, given that the length of each segment matters. (Or maybe random time subsampling?)

A highly innovative part of the paper concentrates on Big Data likelihoods and on the possibility to subsample properly and exactly the original dataset. The authors propose Zig-Zag with subsampling by turning the gradients into random parts of the gradients. While remaining unbiased. There may be a cost associated with this gain of one to n, namely that the upper bounds may turn larger as they handle all elements in the likelihood at once, hence become (even) less efficient. (I am more uncertain about the case of the control variates, as it relies on a Lipschitz assumption.) While I still miss an easy way to implement the approach in a specific model, I remain hopeful for this new approach to make a major dent in the current methodologies!