**W**hen browsing over lunch the April issue of Amstat News, I came upon this page advertising rather loudly the SDSS symposium of next month. And realised that not only it features “perhaps the most prominent statistician to have repeatedly published material written by others without attribution” (a quote from Gelman and Basbøll, 2013, in American Scientist), namely Ed Wegman, as the guest of honor, but also one co-author of a retracted Computational Statistics paper [still included in Wegman’s list of publications] as program chair and another co-author from the “Hockey Stick” plagiarised report as plenary speaker. A fairly friendly reunion, then, if “networking” is to be understood this way, except that this is a major conference, supported by ASA and other organisations. Rather shocking, isn’t it?! (The entry also made me realise that the three co-authors were the original editors of WIREs, before Wegman and Said withdrew in 2012.)

## Archive for plagiarism

## SDSS with friends

Posted in Statistics with tags ASA, climate science, Computational Statistics and Data Analysis, Ed Wegman, hockey stick, plagiarism, SDSS, Symposium on Data Science & Statistics, WIREs on May 4, 2018 by xi'an## Bayesian filtering and smoothing [book review]

Posted in Books, Statistics, Travel, University life with tags book review, CHANCE, EM algorithm, filtering, IMS Textbooks, Kalman filter, MAP estimators, particle filter, particle MCMC, plagiarism, Simo Särkkä, smoothing, The Monty Hall problem on February 25, 2015 by xi'an**W**hen in Warwick last October, I met Simo Särkkä, who told me he had published an IMS monograph on Bayesian filtering and smoothing the year before. I thought it would be an appropriate book to review for CHANCE and tried to get a copy from Oxford University Press, unsuccessfully. I thus bought my own book that I received two weeks ago and took the opportunity of my Czech vacations to read it… *[A warning pre-empting accusations of self-plagiarism: this is a preliminary draft for a review to appear in CHANCE under my true name!]*

“From the Bayesian estimation point of view both the states and the static parameters are unknown (random) parameters of the system.” (p.20)

Bayesian filtering and smoothing is an introduction to the topic that essentially starts from ground zero. Chapter 1 motivates the use of filtering and smoothing through examples and highlights the naturally Bayesian approach to the problem(s). Two graphs illustrate the difference between filtering and smoothing by plotting for the same series of observations the successive confidence bands. The performances are obviously poorer with filtering but the fact that those intervals are point-wise rather than joint, i.e., that the graphs do not provide a confidence band. (The exercise section of that chapter is superfluous in that it suggests re-reading Kalman’s original paper and rephrases the Monty Hall paradox in a story unconnected with filtering!) Chapter 2 gives an introduction to Bayesian statistics in general, with a few pages on Bayesian computational methods. A first remark is that the above quote is both correct and mildly confusing in that the parameters can be consistently estimated, while the latent states cannot. A second remark is that justifying the MAP as associated with the 0-1 loss is incorrect in continuous settings. The third chapter deals with the batch updating of the posterior distribution, i.e., that the posterior at time t is the prior at time t+1. With applications to state-space systems including the Kalman filter. The fourth to sixth chapters concentrate on this Kalman filter and its extension, and I find it somewhat unsatisfactory in that the collection of such filters is overwhelming for a neophyte. And no assessment of the estimation error when the model is misspecified appears at this stage. And, as usual, I find the unscented Kalman filter hard to fathom! The same feeling applies to the smoothing chapters, from Chapter 8 to Chapter 10. Which mimic the earlier ones. Continue reading

## Le Monde on E. Wegman

Posted in Statistics with tags climate change, climatosceptic, computational statistics, CSDA, Ed Wegman, Michael Mann, plagiarism on December 31, 2011 by xi'an**I**n addition to the solution to the wrong problem, * Le Monde* of last weekend also dedicated a full page of its Science leaflet to the coverage of Michael Mann’s hockey curve of temperature increase and the hard time he has been given by climato-skeptics since its publication in 1998… The page includes an insert on Ed Wegman’s 2006 [infamous] report for the U.S. Congress, amply documented on Andrew’s blog. And mentions the May 2011 editorial of

**on the plagiarism investigation. (I reproduce it above as it is not available on the**

*Nature**website.)*

**Le Monde**## An ethical issue

Posted in Statistics, University life with tags ethics, fraud, PhD thesis, plagiarism on November 19, 2011 by xi'an**A** few weeks ago, I was asked to act as an external referee for a PhD thesis. This thesis involved some improvement upon standard statistical methodology and applications to another field. When I eventually got the PhD document, I discovered that it started with a preface (written by the PhD student) containing claims that the student’s work has been used by co-workers, incl. the PhD supervisor, and published in a refereed journal without the student’s name nor agreement, but also with some fabricated data… This was quite a shock as I had not been made aware of this super-delicate issue *a priori*. And I had not information on the published piece of work, which seemed to be in the other field (I have not been able to find it since then). When I complained to the university, I got transferred to the dean of graduate studies, who almost immediately withdrew the demand for a PhD evaluation [by me]…

**I** find the whole affair quite bizarre. and somewhat perturbating. Indeed, when I recontacted the university to mention my concerns, I got the following [edited and possibly translated] email

As I’m sure you can appreciate, this is an unusual case. [We were] not able to alert you to this when nominating you as examiners, as it is important that we follow our University process and allow examiners to reach independent conclusions as to the value of the work before them. [We are] bound by our PhD Statute and would be prejudicing the examination process if [we] provided additional information to examiners. [We] would also be providing a route for the candidate to appeal the outcome of the examination process.

This does not make any sense to me given that any referee of this thesis is going to hit the same case when reading the first pages of the thesis… Either the PhD student should remove this complaint from the PhD document (but this does not seem right, given that there *is* a published paper containing some of the results claimed in the thesis, even though referees from Statistics are very unlikely to be aware of it, as, again, I could not find the corresponding paper), or the whole information should be provided to the referees of the thesis so that they can judge the matter in full light… I do not see how I could pursue the matter any further, but the whole story left me feeling quite uncomfortable.

## R [re-]exam

Posted in Books, R, Statistics, University life with tags exam, graduate course, I love R, plagiarism, R on March 28, 2011 by xi'an**I**n what seems like an endless cuRse, I found this week I had to re-grade a dozen R exams a TA’s did not grade properly! The grades I (X) got are plotted below against those of my TA (Y). There is little connection between both gradings… As if this was not enough trouble, I also found exactly duplicated R codes in another R project around ** Introducing Monte Carlo methods with R** that was returned a few weeks ago. Meaning I will have to draft a second round exam… (As Tom commented on an earlier post, team resolution of a given problem may be a positive attitude, but in the current case one student provided an A⁺⁺ answer, while two others clearly drafted an hasty resolution from the original.) Nonetheless, do not worry, I still love [teaching] R!

## Top ten on HAL

Posted in Statistics, University life with tags HAL, plagiarism, population Monte Carlo on June 24, 2010 by xi'an**I **was updating my entries on HAL from my arXives and found this top ten ranking of my papers:

- Sélection bayésienne de variables en régression linéaire, with A. Guillin and J.-M. Marin inria-00077857
- Adaptive Importance Sampling in General Mixture Classes, with O. Cappé, R. Douc, A. Guillin and J.-M. Marin inria-00181474/hal-00180669
- A Bayesian reassessment of nearest-neighbour classification, with L. Cucala, J.-M. Marin and D.M. Titterington inria-00143783
- Deviance Information Criteria for Missing Data Model, with G. Celeux, F. Forbes and D.M. Titterington inria-00071724
- Minimum variance importance sampling via Population Monte Carlo, with R. Douc, A. Guillin and J.-M. Marin inria-00070316
- Computational and Inferential Difficulties with Mixture Posterior Distributions, with J.-M. Marin inria-00073049
- Are risk averse agents more optimistic? A Bayesian estimation approach, with S. Benmansour, E. Jouini, C. Napp and J.-M. Marin halshs-00163678
- Convergence of adaptive sampling schemes, with R. Douc, A. Guillin and J.-M. Marin inria-00070522
- Brownian Confidence Bands on Monte Carlo Output, with W. Kendall and J.-M. Marin inria-00070571
- Iterated importance sampling in missing data problems, with G. Celeux and J.-M. Marin inria-00070473

Nothing much to comment except that those are only recent papers (obviously, since HAL also is a recent creation), a large majority of which revolve around population Monte Carlo (and almost all co-authored with Jean-Michel Marin!). The #9 with WIlfrid Kendall and Jean-Michel Marin is clearly very popular as someone attempted to plagiarise it! The #1 comes as a real surprise, given that it is in French and more of a survey.

## plagiarism exposed!

Posted in R, Statistics, University life with tags Bayesian non-parametrics, beta mixtures, C, mixture estimation, plagiarism, R on June 15, 2010 by xi'an**L**ast morn, I had the surprise of receiving the following email:

This is to inform you that the following abstract has been submitted to the 3rd International Conference of the ERCIM WG on COMPUTING & STATISTICS (ERCIM’10)

Ab#: 114

Title: Goodness of Fit Via Mixtures of Beta distributions

Keywords: nonparametric estimation, posterior conditional predictive p-value.

Abstract: We consider a Bayesian approach to goodness of fit, that is, to the problem of testing whether or not a given parametric model is compatible with the data at hand . Since we are concerned with a goodness of fit problem, it is more of interest to consider a functional distance to the tested model d(F;F) as the basis of our test, rather than the corresponding Bayes factor, since the later puts more emphasis on the parameters. It is both of high interest and of strong difficulty to come up with a satisfactory notion of a Bayesian test for goodness ofit to a distribution or to a family of distributions.

The abstract is a plagiarism of your work.

I am informing you of about this in case the author has tried to plagiarize the whole paper. The same author has submitted a second abstract plagiarizing another paper. The author uses bogus affiliations and I cannot trace his institution in case he has one.

**I**t is somehow comforting to see that such a gross example of plagiarism can get detected, despite the fact that our paper never got published. Although I am sure there must be conferences that do not apply any filter on the submission…

**T**his paper with Judith Rousseau was once submitted to Series B, but I could not come to complete the requested revision for programming motives, the task of modifying the several thousand lines of C code driving the beta mixture estimation filling me with paralysing dread! This is actually the time when I stopped programming in C (the fact that I ever really programmed in C is actually debatable!). This is unfortunate as the spirit of the paper was quite nice, using an idea borrowed from Verdinelli and Wasserman to build a genuine Bayesian goodness of fit test… I do not think there is much to salvage at this later stage, given the explosion of Bayesian non-parametrics.