Archive for bibliometrics

stop the rot!

Posted in Statistics with tags , , , , , , , , , , , , on September 26, 2017 by xi'an

Several entries in Nature this week about predatory journals. Both from Ottawa Hospital Research Institute. One emanates from the publication officer at the Institute, whose role is “dedicated to educating researchers and guiding them in their journal submission”. And telling the tale of a senior scientist finding out a paper submitted to a predatory journal and later rescinded was nonetheless published by the said journal. Which reminded me of a similar misadventure that occurred to me a few years ago. After having a discussion of an earlier paper therein rejected from The American Statistician, my PhD student Kaniav Kamary and I resubmitted it to the Journal of Applied & Computational Mathematics, from which I had received an email a few weeks earlier asking me in flowery terms for a paper. When the paper got accepted as such two days after submission, I got alarmed and realised this was a predatory journal, which title played with the quasi homonymous Journal of Computational and Applied Mathematics (Elsevier) and International Journal of Applied and Computational Mathematics (Springer). Just like the authors in the above story, we wrote back to the editors, telling them we were rescinding our submission, but never got back any reply or request of copyright transfer. Instead, requests for (diminishing) payments were regularly sent to us, for almost a year, until they ceased. In the meanwhile, the paper had been posted on the “journal” website and no further email of ours, including some from our University legal officer, induced a reply or action from the journal…

The second article in Nature is from a group of epidemiologists at the same institute, producing statistics about biomedical publications in predatory journals (characterised as such by the defunct Beall blacklist). And being much more vehement about the danger represented by these journals, which “articles we examined were atrocious in terms of reporting”, and authors submitting to them, as unethical for wasting human and animal observations. The authors of this article identify thirteen characteristics for spotting predatory journals, the first one being “low article-processing fees”, our own misadventure being the opposite. And they ask for higher control and auditing from the funding institutions over their researchers… Besides adding an extra-layer to the bureaucracy, I fear this is rather naïve, as if the boundary between predatory and non-predatory journals was crystal clear, rather than a murky continuum. And putting the blame solely on the researchers rather than sharing it with institutions always eager to push their bibliometrics towards more automation of the assessment of their researchers.

a discovery that mean can be impacted by extreme values

Posted in University life with tags , , , , , , on August 6, 2016 by xi'an

A surprising editorial in Nature about the misleading uses of impact factors, since as means they are heavily impacted by extreme values. With the realisation that the mean is not the median for skewed distributions…

To be fair(er), Nature published a subsequent paper this week about publishing additional metrics like the two-year median.

Le Monde puzzle [#851]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on February 6, 2014 by xi'an

A more unusual Le Monde mathematical puzzle:

Fifty black and white tokens are set on an equilateral triangle of side 9, black on top and white on bottom. If they can only be turned three by three, determine whether it is possible to produce a triangle with all white sides on top, under each of the following constraints:

  • the three tokens must stand on a line;
  • the three tokens must stand on a line and be contiguous;
  • the three tokens must stand on the summits of an equilateral triangle;
  • the three tokens must stand on the summits of an equilateral triangle of side one.

I could not think of a quick fix with an R code so leave it to the interested ‘Og reader… In the next issue of the Science&Médecine leaflet (Jan. 29), which appeared while I was in Warwick, there were a few entries of interest. First, the central article was about Big Data (again), but, for a change, the journalist took the pain to include French statisticians and machine learners in the picture, like Stefan Clemençon, Aurélien Garivier, Jean-Michel Loubes, and Nicolas Vayatis. (In a typical French approach, the subtitle was “A challenge for maths”, rather than statistics!) Ignoring the (minor) confusion therein of “small n, large p” with the plague of dimensionality, the article does mention a few important issues like distributed computing, inhomogeneous datasets, overfitting and learning. There are also links to the new masters in data sciences at ENSAE, Telecom-Paritech, and Paris 6-Pierre et Marie Curie. (The one in Paris-Dauphine is still under construction and will not open next year.) As a side column, the journal also wonders about the “end of Science” due to massive data influx and “Big Data” techniques that could predict and explain without requiring theories and deductive or scientific thinking. Somewhat paradoxically, the column ends up by a quote of Jean-Michel Loubes, who states that one could think “our” methods start from effects to end up with causes, but that in fact the models are highly dependent on the data. And on the opinion of experts. Doesn’t that suggest some Bayesian principles at work there?!

Another column is dedicated to Edward Teller‘s “dream” of using nuclear bombs for civil engineering, like in the Chariot project in Alaska. And the last entry is against Kelvin’s “to measure is to know”, with the title “To known is not to measure”, although it does not aim at a general philosophical level but rather objects to the unrestricted intrusion of bibliometrics and other indices brought from marketing. Written by a mathematician, this column is not directed against statistics and the Big Data revolution, but rather the myth that everything can be measured and quantified. (There was also a pointer to a tribune against the pseudo-recruiting of top researchers by Saudi universities in order to improve their Shanghai ranking but I do not have time to discuss it here. And now. Maybe later.)

Microsoft wrote me an email

Posted in University life with tags , , , on November 23, 2011 by xi'an

I received the following and unsolicited email today from Microsoft Research:

Dear Christian,
Microsoft Research would like to tell you about Microsoft Academic Search (MAS) a search engine to explore publications, authors, conferences, journals and their relationships. Based on our data mining algorithm and data on the web, MAS has aggregated some of your information here.
This is our initial coverage into such academic area, we understand that our coverage is very limited, therefore the aggregated information might not be 100% correct or complete. We are working on finding and processing more data, better name disambiguation, and other enhancements. While you’re here, please check out interactive features like relationship path, and public APIs if you’re interested in using our data set in your research work.
We would love to hear your thoughts about how MAS can help your research and work. It would be great if you can take some time to fill out this short anonymous survey.
Best regards,
Microsoft Academic Search Team

which I find rather astounding. In the sense that the MAS team is basically asking me to correct the inaccuracies in a bibliometric tool I am not interested in! (The link to the survey was not working, not that I was particularly excited in answering! And there is no direct way to correct the information contained in the file, as opposed to google scholar citations…)

Automated promotion

Posted in University life with tags , on December 9, 2010 by xi'an

Olivier Cappé pointed out to me this reference where Cyril Labbé explains how to achieve a high ranking on Google Scholar with (fake) automatically generated papers… Of course, the ranking does not stand any close examination but nonetheless…

Citation abuses

Posted in Statistics with tags , , , , , , on October 21, 2009 by xi'an

“There is a belief that citation statistics are
inherently more accurate because they
substitute simple numbers for complex
judgments, and hence overcome the
possible subjectivity of peer review.
But this belief is unfounded.”

A very interesting report appeared in the latest issue of Statistical Science about bibliometrics and its abuses (or “bibliometrics as an abuse per se”!). It was commissioned by the IMS, the IMU and the ICIAM. Along with the set of comments (by Bernard Silverman, David Spiegelhalter, Peter Hall and others) also posted in arXiv, it is a must-read!

“even a casual inspection of the h-index and its variants shows
that these are naïve attempts to understand complicated citation
records. While they capture a small amount of information about
the distribution of a scientist’s citations, they lose crucial
information that is essential for the assessment of research.”

The issue is not gratuitous. While having Series B ranked with a high impact factor is an indicator of the relevance of a majority of papers published in the journal, there are deeper and more important issues at stake. Our grant allocations, our promotions, our salary are more and more dependent on these  “objective” summary or “comprehensive” factors. The misuse of bibliometrics stems from government bodies and other funding agencies wishing to come up with assessments of the quality of a researcher that bypass peer reviews and, more to the point, are easy to come by.

The report points out the many shortcomings of journal impact factors. Its two-year horizon is very short-sighted in mathematics and statistics. As an average, it is strongly influenced by outliers, like controversial papers or broad surveys, as shown by the yearly variations of the thing. Commercial productions like Thomson’s misses a large part of the journals that could quote a given paper and this is particularly true for fields at the interface between disciplines and for emergent topics. The variation in magnitude between disciplines is enormous and based on the impact factor I’d rather publish one paper in Bioinformatics than four in the Annals of Statistics… The second issue is that the “quality” of the journal does not automatically extend to all papers it publishes: multiplying papers by the journal impact factor is thus ignoring variation to an immense extent. The report illustrates this with the fact that a paper published in a journal with half the impact factor of another journal has a 62% probability to be more quoted than if it had been published in this other journal! The h-factor is similarly criticised by the report.  More fundamentally, the report also analyses the multicriteria nature of citations, which cannot be reflected (only) as a measure of worth of the quoted papers.